Which Azure Storage method is best for a temporary file transfer? - azure

I want to automate the transfer of files from a website not hosted in Azure to my client’s premises.
I am considering having an API on the website send the files to Azure Blob Storage , and then having another API running at the client site, download them.
Both would make use of the Azure storage API, which I like because it is easy to implement.
The files do not need to stay in Azure and can be deleted from storage once they are downloaded.
However I am wondering if there is a faster way.
Should I be using Hot Blob Storage or File Storage perhaps?
I looked at https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-storage-tiers but am still unclear as to the fastest method for my use case.

I suggest you can use File share, which can be mapped to local as a mapped drive and can be easily and faster operation like read / delete.
If you choose code only, from the comparison of blob and file, they can be up to Up to 60 MiB/s, I cannot see which is faster. There is a Azure Storage Data Movement Library , which is designed for high-performance uploading, downloading and copying Azure Storage Blob and File, you can use it for your purpose.

I would recommend blob storage for this application. Logic apps can also be used to automate this pipeline based on timer triggers or some other trigger.

Related

moving locally stored documented documents to azure

I want to spike whether azure and the cloud is a good fit for us.
We have a website where users upload documents to our currently hosted website.
Every document has an equivalent record in a database.
I am using terraform to create the azure infrastructure.
What is my best way of migrating the documents from the local file path on the server to azure?
Should I be using file storage or blob storage. I am confused about the difference.
Is there anything in terraform that can help with this?
Based on your comments, I would recommend storing them in Blob Storage. This service is suited for storing and serving unstructured data like files and images. There are many other features like redundancy, archiving etc. that you may find useful in your scenario.
File Storage is more suitable in Lift-and-Shift kind of scenarios where you're moving an on-prem application to the cloud and the application writes data to either local or network attached disk.
You may also find this article useful: https://learn.microsoft.com/en-us/azure/storage/common/storage-decide-blobs-files-disks
UPDATE
Regarding uploading files from local computer to Azure Storage, there are actually many options available:
Use a Storage Explorer like Microsoft's Storage Explorer.
Use AzCopy command-line tool.
Use Azure PowerShell Cmdlets.
Use Azure CLI.
Write your own code using any available Storage Client libraries or directly consuming REST API.

Azure storage sync mechanisms

I have a problem that I have been wracking my brain about and figured I would need some perspective and insight from people who are a lot more knowledgeable about this.
What I have currently: Web based application hosted in azure uses azure blob store to store files that are generated as part of data import processes. We have a seperate application that extends the original web application that allows users to upload files and these files are currently also stored in azure blob store.
Where I am trying to go: I have a requirement that wants the ability to map network file shares on a users laptop and be able to access these files that currently reside in the blob.
Since Azure blob does not support SMB I have no way of actually doing this with a blob store.
I could use Azure files in conjunction with a File Server running the sync agent. However, this requires a lot of work both in terms of refactoring, setup and some custom service that add remove permissions on the file server.
I'm wondering if there is a service or a piece of software that exists in the market currently that allows me to continue using blob and perhaps sync the blob files into a file server that can then allow users to access and open files using windows file explorer? I found one that looks like an open source project but only does a one way sync from the blob to the file share. Ideally I'd like to find a solution that does a two way sync like azure file sync does.
Any thoughts and ideas will be appreciated.
Since the max number of blob containers, file shares is unlimited. Per my understanding, you could leverage the following approaches:
Migrate the data from blob storage to azure file share instead of blob storage, then the subsequent file store is azure file storage.
Note: Currently you must specify storage account key when mounting file shares, details you could follow this feedback. I recommend that you'd better do not map network file shares on a users laptop.
You could still use the blob storage, and you could create each blob container for each user and generate each blob container SAS token for your users, then the users could leverage Azure Storage Explorer to manage their blob files or use AzCopy and other command tools to download the blob files into their laptop file system.
Note: For security consideration, you could combine a stored access policy with a SAS, in order to revoke the permissions, you just need to invalidate the related access policy instead of regenerating the account key. Details you could follow Controlling a SAS with a stored access policy and Shared Access Signatures, Part 2: Create and use a SAS with Blob storage.

Can Azure Data Factory write to FTP

I want to write the output of pipeline to an FTP folder. ADF seems to support on-premises file but not FTP folder.
How can I write the output in text format to an FTP folder?
Unfortunately FTP Servers are not a supported data store for ADF as of right now. Therefore there is no OOTB way to interact with an FTP Server for either reading or writing.
However, you can use a custom activity to make it possible, but it will require some custom development to make this happen. A fellow Cloud Solution Architect within MS put together a blog post that talks about how he did it for one of his customers. Please take a look at the following:
https://blogs.msdn.microsoft.com/cloud_solution_architect/2016/07/02/creating-ftp-data-movement-activity-for-azure-data-factory-pipeline/
I hope that this helps.
Upon thinking about it you might be able to achieve what you want in a mildly convoluted way by writing the output to a Azure Blob storage account and then either
1) manually: downloading and pushing the file to the "FTP" site from the Blob storage account or
2) automatically: using Azure CLI to pull the file locally and then push it to the "FTP" site with a batch or shell script as appropriate
As a lighter weight approach to custom activities (certainly the better option for heavy work).
You may wish to consider using azure functions to write to ftp (note there is a time out when using a consumption plan - not in other plans, so it will depend on how big the files are).
https://learn.microsoft.com/en-us/azure/azure-functions/functions-create-storage-blob-triggered-function
You could instruct data factory to write to a intermediary blob storage.
And use blob storage triggers in azure functions to upload them as soon as they appear in blob storage.
Or alternatively, write to blob storage. And then use a timer in logic apps to upload from blob storage to ftp. Logic Apps hide a tremendous amount of power behind there friendly exterior.
You can write a Logic app that will pick your file up from Azure storage and send it to an FTP site. Then call the Logic App using a Data Factory Web Activity.
Make sure you do some error handling in your Logic app to return 400 if the ftp fails.

Azure Blob storage and HDF file storage

I am in the middle of developing a cloud server and I need to store HDF files ( http://www.hdfgroup.org/HDF5/ ) using blob storage.
Functions related to creating, reading writing and modifying data elements within the file come from HDF APIs.
I need to get the file path to create the file or read or write it.
Can anyone please tell me how to create a custom file on Azure Blob ?
I need to be able to use the API like shown below, but passing the Azure storage path to the file.
http://davis.lbl.gov/Manuals/HDF5-1.4.3/Tutor/examples/C/h5_crtfile.c
These files i am trying to create can get really huge ~10-20GB, So downloading them locally and modifying them is not an option for me.
Thanks
Shashi
One possible approach, admittedly fraught with challenges, would be to create the file in a temporary location using the code you included, and then use the Azure API to upload the file to Azure as a file input stream. I am in the process of researching how size restrictions are handled in Azure storage, so I can't say whether an entire 10-20GB file could be moved in a single upload operation, but since the Azure API reads from an input stream, you should be able to create a combination of operations that would result in the information you need residing in Azure storage.
Can anyone please tell me how to create a custom file on Azure Blob ?
I need to be able to use the API like shown below, but passing the
Azure storage path to the file.
http://davis.lbl.gov/Manuals/HDF5-1.4.3/Tutor/examples/C/h5_crtfile.c
Windows Azure Blob storage is a service for storing large amounts of unstructured data that can be accessed via HTTP or HTTPS. So from application point of view Azure Blob does not work as regular disk.
Microsoft provides quite good API (c#, Java) to work with the blob storage. They also provide Blob Service REST API to access blobs from any other language (where specific blob storage API is not provided like C++).
A single block blob can be up to 200GB so it should easily store files of ~10-20GB size.
I am afraid that the provided example will not work with Windows Azure Blob. However, I do not know HDF file storage; maybe they provide some Azure Blob storage support.

Upload 650,000 documents to Azure

I can't seem to find any reference to bulk uploading data to azure.
I have a document store with 650,000 pdf document that take up about 1.2 TB of disk space.
Uploading those files to Azure via the web will be difficult. Is there a way I can mail a hard drive and have your team upload them for me?
If not can you recommend the best way to upload this many documents?
Maybe not the answer you expected, but you could use Amazon's AWS Import/Export (this allows you to mail them a HDD and they'll import it in your S3 account).
To transfer the data to a Windows Azure Storage Account you can leverage one of the new features in the 1.7.1 SDK: the StartCopyFromBlob method. This method allows you to copy a file at a specific url in an asynchronous way (you could use this to copy all files from your S3 to your Azure storage account).
Read the following blogpost for a fully working example: How to Copy a Bucket from Amazon S3 to Windows Azure Blob Storage using “Copy Blob”
While Azure doesn't offer a physical ingestion process today, if you talk nicely to the Azure team they can do this as a one off. If you like I can get a contact on the product team for you (dave at greenbutton dot com).
Alternatively there are solutions such as Aspera which provide for accelerated data transfers over UDP and is being beta test in Azure along with the Azure Media Services offering.
We have some tools that help with this as well http://www.greenbutton.com and leverage Aspera's technology.
As disk shipment are not supported by Windows Azure, your best bet is use a 3rd party application (or write your own one) which supports parallel upload. This way you can still upload much faster. 3rd party applications like Gladinet, Cloudberry could be used for upload the data but I am not sure how configurable they are to get maximum parallel upload to achieve fastest upload.
If you decide to write by yourself here is the starting point: Asynchronous Parallel Block Blob Transfers with Progress Change Notification
I know this is a bit too late for the OP, but in the Azure Management Portal, under Storage, pick your storage instance, then click the Import/Export link at the top. At the bottom of that screen, there is a "Create Import Job" link and icon. Also, if you click the blue help icon on the far right side, it says this:
You can use the Windows Azure Import/Export service to transfer large amounts of file data to Windows Azure Blob storage in situations where uploading over the network is prohibitively expensive or infeasible. You can also use the Import/Export service to transfer large quantities of data resident in Blob storage to your on-premises installations in a timely and cost-effective manner. Use the Windows Azure Import/Export Service to Transfer Data to Blob Storage
To transfer a large set of file data into Blob storage, you can send one or more hard drives containing that data to a Microsoft data center, where your data will be uploaded to your storage account. Similarly, to export data from Blob storage, you can send empty hard drives to a Microsoft data center, where the Blob data from your storage account will be copied to your hard drives and then returned to you. Before you send in a drive that contains data, you'll encrypt the data on the drive; when Microsoft exports your data to send to you, the data will also be encrypted before shipping.
Both windows azure storage powershell and azcopy could bulk upload data to azure.
For azure storage powershell, you could use ls -File -Recurse | Set-AzureStorageBlobContent -Container upload.
You can refer http://msdn.microsoft.com/en-us/library/dn408487.aspx for more details.
For azcopy, you can refer this article http://blogs.msdn.com/b/windowsazurestorage/archive/2012/12/03/azcopy-uploading-downloading-files-for-windows-azure-blobs.aspx

Resources