Azure Blob storage and HDF file storage - azure

I am in the middle of developing a cloud server and I need to store HDF files ( http://www.hdfgroup.org/HDF5/ ) using blob storage.
Functions related to creating, reading writing and modifying data elements within the file come from HDF APIs.
I need to get the file path to create the file or read or write it.
Can anyone please tell me how to create a custom file on Azure Blob ?
I need to be able to use the API like shown below, but passing the Azure storage path to the file.
http://davis.lbl.gov/Manuals/HDF5-1.4.3/Tutor/examples/C/h5_crtfile.c
These files i am trying to create can get really huge ~10-20GB, So downloading them locally and modifying them is not an option for me.
Thanks
Shashi

One possible approach, admittedly fraught with challenges, would be to create the file in a temporary location using the code you included, and then use the Azure API to upload the file to Azure as a file input stream. I am in the process of researching how size restrictions are handled in Azure storage, so I can't say whether an entire 10-20GB file could be moved in a single upload operation, but since the Azure API reads from an input stream, you should be able to create a combination of operations that would result in the information you need residing in Azure storage.

Can anyone please tell me how to create a custom file on Azure Blob ?
I need to be able to use the API like shown below, but passing the
Azure storage path to the file.
http://davis.lbl.gov/Manuals/HDF5-1.4.3/Tutor/examples/C/h5_crtfile.c
Windows Azure Blob storage is a service for storing large amounts of unstructured data that can be accessed via HTTP or HTTPS. So from application point of view Azure Blob does not work as regular disk.
Microsoft provides quite good API (c#, Java) to work with the blob storage. They also provide Blob Service REST API to access blobs from any other language (where specific blob storage API is not provided like C++).
A single block blob can be up to 200GB so it should easily store files of ~10-20GB size.
I am afraid that the provided example will not work with Windows Azure Blob. However, I do not know HDF file storage; maybe they provide some Azure Blob storage support.

Related

Azure storage options to serve content on Azure Web App

I am a total newbie to Azure WebApps and storage, I need some clarification/confirmation. The main thing to take note of, my application (described below) requires a folder hierarchy. Blob is out of the question and file share doesn't allow anonymous access unless I use Shared Access Signature (SAS).
Am I understanding Azure storage correctly, it's either you fit into the Azure storage model or you don't?
Can anyone advise how I can achieve what's required by the CMS application as described below by using Blobs?
The only option I see is to find a way to change the CMS application so that it always has the SAS in the URL to every file it requests from storage in order to serve content on my Web App? If so, is it a problem if I set my SAS to expire sometime in the distant future?
https://<appname>.file.core.windows.net/instance1/site1/file1.jpg?<SAS>
Problem with using Blob
So far my understanding is that Blob storage doesn't allow "sub folders" as it's a container that holds unstructured data, therefore I'm unable to use this based on my application (described below) as it requires folder structure.
The problem with using File Share
File share seemed perfect as it allows for folder hierarchy, naturally that's what I've used.
However, no anonymous access is allowed for files stored in file storage, the access needs to be authorised. One way of authorising the access is to create a SAS on a file/share level with Read permission and then using that SAS URL to access the file.
Cannot access Windows azure file storage document
My application
I've created a Linux Web App running open source CMS application. This application allows creation of multiple websites, for each website's content such as images, docs, multimedia to be stored on a file server. These files are then served to the website via a defined URL.
The CMS application allows for a settings of the location where it should save its files, this would be a folder on the file server. It then creates a new sub folder for every site it hosts in that location.
Example folder hierarchy
/instance1
/site1
/file1
/file2
/site2
/file1
/file2
Am I understanding Azure storage correctly, it's either you fit into
the Azure storage model or you don't?
You can use Azure Storage Model for your CMS Application. You can use either Blob Storage or File Share
Can anyone advise how I can achieve what's required by the CMS
application as described below by using Blobs?
You can use Data Lake Gen 2 storage account if you want to use Azure Blob Storage.
Data Lake Gen 2 storage enables hierarchical namespace so that you can use subfolders in the Blob Storage as per your requirements
Problem with using Blob
Blob Storage allows subfolders if we use Data Lake Gen 2 storage account. You can enable Blob Public Anonymous access
The problem with using File Share
Azure File Share supports but does not allow public anonymous access. You can use Azure Managed Identity (System-Assigned) for your web app to access the Azure File Share.
Then your application would be able to access the Azure File Share without SAS token
The issue of not having real folders in a blob storage shouldn't be any issue for your use case. Just because it doesn't have your traditional folders doesn't mean it can't serve content on e.g. instance1/site1/file1. That's still possible but the instance1/site1/ will just be part of the name of the blob.
Tools like the Azure Portal or Storage Explorer will actually show folders by using the delimiter / and querying data that appears to be inside a folder by using the path as prefix.

moving locally stored documented documents to azure

I want to spike whether azure and the cloud is a good fit for us.
We have a website where users upload documents to our currently hosted website.
Every document has an equivalent record in a database.
I am using terraform to create the azure infrastructure.
What is my best way of migrating the documents from the local file path on the server to azure?
Should I be using file storage or blob storage. I am confused about the difference.
Is there anything in terraform that can help with this?
Based on your comments, I would recommend storing them in Blob Storage. This service is suited for storing and serving unstructured data like files and images. There are many other features like redundancy, archiving etc. that you may find useful in your scenario.
File Storage is more suitable in Lift-and-Shift kind of scenarios where you're moving an on-prem application to the cloud and the application writes data to either local or network attached disk.
You may also find this article useful: https://learn.microsoft.com/en-us/azure/storage/common/storage-decide-blobs-files-disks
UPDATE
Regarding uploading files from local computer to Azure Storage, there are actually many options available:
Use a Storage Explorer like Microsoft's Storage Explorer.
Use AzCopy command-line tool.
Use Azure PowerShell Cmdlets.
Use Azure CLI.
Write your own code using any available Storage Client libraries or directly consuming REST API.

Which Azure Storage method is best for a temporary file transfer?

I want to automate the transfer of files from a website not hosted in Azure to my client’s premises.
I am considering having an API on the website send the files to Azure Blob Storage , and then having another API running at the client site, download them.
Both would make use of the Azure storage API, which I like because it is easy to implement.
The files do not need to stay in Azure and can be deleted from storage once they are downloaded.
However I am wondering if there is a faster way.
Should I be using Hot Blob Storage or File Storage perhaps?
I looked at https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-storage-tiers but am still unclear as to the fastest method for my use case.
I suggest you can use File share, which can be mapped to local as a mapped drive and can be easily and faster operation like read / delete.
If you choose code only, from the comparison of blob and file, they can be up to Up to 60 MiB/s, I cannot see which is faster. There is a Azure Storage Data Movement Library , which is designed for high-performance uploading, downloading and copying Azure Storage Blob and File, you can use it for your purpose.
I would recommend blob storage for this application. Logic apps can also be used to automate this pipeline based on timer triggers or some other trigger.

Azure storage sync mechanisms

I have a problem that I have been wracking my brain about and figured I would need some perspective and insight from people who are a lot more knowledgeable about this.
What I have currently: Web based application hosted in azure uses azure blob store to store files that are generated as part of data import processes. We have a seperate application that extends the original web application that allows users to upload files and these files are currently also stored in azure blob store.
Where I am trying to go: I have a requirement that wants the ability to map network file shares on a users laptop and be able to access these files that currently reside in the blob.
Since Azure blob does not support SMB I have no way of actually doing this with a blob store.
I could use Azure files in conjunction with a File Server running the sync agent. However, this requires a lot of work both in terms of refactoring, setup and some custom service that add remove permissions on the file server.
I'm wondering if there is a service or a piece of software that exists in the market currently that allows me to continue using blob and perhaps sync the blob files into a file server that can then allow users to access and open files using windows file explorer? I found one that looks like an open source project but only does a one way sync from the blob to the file share. Ideally I'd like to find a solution that does a two way sync like azure file sync does.
Any thoughts and ideas will be appreciated.
Since the max number of blob containers, file shares is unlimited. Per my understanding, you could leverage the following approaches:
Migrate the data from blob storage to azure file share instead of blob storage, then the subsequent file store is azure file storage.
Note: Currently you must specify storage account key when mounting file shares, details you could follow this feedback. I recommend that you'd better do not map network file shares on a users laptop.
You could still use the blob storage, and you could create each blob container for each user and generate each blob container SAS token for your users, then the users could leverage Azure Storage Explorer to manage their blob files or use AzCopy and other command tools to download the blob files into their laptop file system.
Note: For security consideration, you could combine a stored access policy with a SAS, in order to revoke the permissions, you just need to invalidate the related access policy instead of regenerating the account key. Details you could follow Controlling a SAS with a stored access policy and Shared Access Signatures, Part 2: Create and use a SAS with Blob storage.

Run command line EXE on Azure, supplying a path to an Azure Blob

I have an Azure web app that stores documents in Azure blob "container X".
Now, we want to run a "job" to generate specialized reports for these documents.
This includes running an EXE file that takes a document path as argument, letting it generate a report on the file system, and uploading this to Azure blob "container Y".
Like: generate-report.exe document.doc generates report.txt.
How can this be done? Do we need to download the blob to the web app, or is it possible to somehow refer to a blob as we refer to a physical disk file?
You cannot refer to a blob as a local file object, since blob storage does not implement any type of file I/O abstraction layer. Yes, you can use File Storage service, which implements SMB on top of blob storage, but this is different than working with individual blobs.
If your app is built to deal just with file objects, you'd need to copy it from blob storage to local disk first, then upload the results from local disk to blob storage.
If you're just building your app, you can directly access blob content via the REST API (or one of the various language-specific SDK's that wrap the API).
Reading file from the blob can be done in form of stream that can later be used to create the text file inside the web app also.
You can also create web jobs under web app to accomplish this task in backend.
https://azure.microsoft.com/en-in/documentation/articles/storage-dotnet-how-to-use-blobs/

Resources