Is it possible to mount Azure Data Lake Store or Azure Blob Storage as a drive on a Windows or Linux VM - azure

My task is to migrate our data store which is currently located on a network drive to Azure Data Lake Store or Blob Storage, as well as to migrate the ingestion and postprocessing software.
If I can mount Azure Data Lake Store or Blob Storage as a drive, it would make my task much easier.

You can easily mount an Azure File Share to Windows and Linux boxes: https://learn.microsoft.com/en-us/azure/storage/files/storage-how-to-use-files-windows
Additionally, if you're looking to go directly to blob from a Linux box, you can use Fuse (https://learn.microsoft.com/en-us/azure/storage/blobs/storage-how-to-mount-container-linux and https://azure.microsoft.com/en-us/blog/linux-fuse-adapter-for-blob-storage/).

I would advise against this approach. It makes more sense to abstract the details of storage in your software, therefore your application has no clue what type of storage is being used. It sounds like you have a massive coupling issues (technical debt) as your root cause. While mounting may work, it may not scale. So your mileage may vary.

You can mount Azure blob/datalake using goofys: https://github.com/kahing/goofys/blob/master/README-azure.md

Related

Faster blob storage copy tools across regions

I need to copy containers in Blob Storage across regions and wanted a solution that would do it without having to download locally and then upload it again. For example, I am trying to copy a container from East US to a container in SouthEast Asia. I used AzCopy to do that and the throughput I got was 22 Mb/s at best. I am not doing /SyncCopy either so is this best throughput the tool provides cross region ? Do we any other external tools that provide faster results ? Thanks.
Azcopy is your best bet when it comes to rapid data move within Azure. You could also consider using Azure Import/Export service if you have an urgent timeline for large amount of data transfer:
using Azure Import/Export service to securely transfer large amounts of data to Azure Blob storage and Azure Files by shipping disk drives to an Azure data center. This service can also be used to transfer data from Azure storage to hard disk drives and ship to your on-premise sites. Data from a single internal SATA disk drive can be imported either to Azure Blob storage or Azure Files.
There are also some external tools:
https://www.signiant.com/signiant-flight-for-fast-large-file-transfers-to-azure-blob-storage/
and:
http://asperasoft.com/fast-file-transfer-with-aspera-sod-azure/
https://learn.microsoft.com/en-us/azure/storage/common/storage-import-export-service
https://learn.microsoft.com/en-us/azure/storage/common/storage-moving-data

Is it possible to mount an Azure datalake as a drive on a linux server?

Our end goal is for our Linux VM servers to access the Azure Datalake directly as a mounted filesystem. Microsoft states that the Azure Datalake is hdfs compatible so we were wondering if it is possible to mount directly through something like Fuse or indirectly through a Hadoop system?
Anything available in Azure goes.
Desperately looking for examples from somebody who has done this.
goofys supports mounting azure datalake: https://github.com/kahing/goofys/blob/master/README-azure.md#azure-blob-storage
Presently, it is not possible to mount an Azure Data Lake Store account as a drive on a linux server.
Please add a feature request at http://aka.ms/adlfeedback.

How to write to a tmp/temp directory in Windows Azure website

How would I write to a tmp/temp directory in windows azure website? I can write to a blob, but i'm using an NPM that requires me to give it file names so that it can directly write to those filenames.
Are you using Cloud Services (PaaS) or Virtual Machines (IaaS).
If PaaS, look at Windows Azure Local Storage. This option gives you up to 250gb of disk space per core. Its a great location for temporary storage of information in a way that traditional apps will be familiar with. However, its not persistent so if you put anything there you need to make sure will be available if the VM instance gets repaved, then copy it to Blob storage. Also, this storage is specific to a given role instance. So if you have two instances of the same role, they each have their own local storage buckets.
Alternatively, you can use Azure Drive, which allows you to keep the information persisted, but still doesn't allow multiple parallel writes.
If IaaS, then you can just mount a data disk to the VM and write to it directly. Data disks are already persisted to blob storage so there's little risk of data loss.
Just from my understanding and please correct me if anything wrong.
In Windows Azure Web Site, the content of your website will be stored in blob storage and mounted as a drive, which will be used for all instances your web site is using. And since it's in blob storage it's persistent. So if you need the local file system I think you can use the folders under your web site root path. But I don't think you can use the system tmp or temp folder.

Dynamic file hosting on Azure

I am using Windows Azure for a custom blog implementation. The blog uses CKEditor and the CKFinder file management plugin. Typically the file management plugin connects to a file system directory to store the files. I need to store these as if it was a local directory and serve them through HTTP requests. In Azure you cannot rely on the file system to maintain through recycles.
I assume you are to use Azure Storage, but am at a loss as to how to do this. Is there a way to "mount" these storage systems to the file system? Am I correct in my assumptions to use storage? If not any guidance as to what I am missing?
Thanks
Or, you could use AzureBlobDrive to mount blob storage as a drive in Azure directly (no VHD, no limitation on only one instance being able to write).
https://github.com/richorama/AzureBlobDrive
You can actually mount a page blob as an NTFS drive, which is then a "durable drive" (just like any other blob), and you access it via a drive letter, just like a locally-attached (but volatile) drive.
The issue is that, using mounted drives, you may only have one writer, so this might cause challenges when scaling to multiple instances.
Take a look at this MSDN post to see an example of mounting a drive. Notice that, while the example doesn't set up any cache, you can specify a cache size. The cache is stored on a local disk resource.
EDIT: For a tutorial, download the Windows Azure Training Kit. Go to hands-on labs, and open Exploring Windows Azure Storage. Check out Exercise 4: Working with Drives.

Azure Blob storage vs Azure Drive

I am looking at moving to Windows Azure rather than typical hosting however I'm unsure how best to store images. After searching I've found that there are 2 possible solutions - Blob storage or Azure drive.
I have looked into Blob storage and although I have begun to get used to the idea it will require quite a lot of modification to our CMS. In my searching I have just stumbled across Azure Drive which if I understand correctly creates a virtual hard drive which allows your application to run as it would on a normal server.
Are there any disadvantages to Azure Drive over blob storage? It sounds like migrating current applications to Azure will be much easier with Azure Drive rather than Blob storage but I just wanted to check that there weren't any major flaws in this.
Thanks
Pat
Yes, there are quite a few differences. First, the Windows Azure drive is actually a VHD uploaded as a page blob and mounted by a driver to provide a NTFS partition. So, to get any data on it, you must mount it (or a snapshot). Data is not directly accessible without mounting.
Next, Drives can only be mounted for RW by one instance. If you want anything else to even read that drive, you must snapshot and mount, which introduces a 'staleness' problem to read only instances that are mounting snapshots. You can work around this with an SMB share, but that is slightly complicated.
You would lose the ability to automatically get CDN capabilities if you used a drive as well.
Drives are great for their intended purpose - getting applications that must use NTFS to work in Windows Azure.
If you were to use Blobs natively, you would a.) get the storage subsystem to scale and remove the load from your instances for serving the data and b.) be able to use the CDN to get geoscale on the
images as well.
While it is some work, I would strongly recommend putting images in blob storage. It is ideal for it.

Resources