Access Azure Storage Blob as file system - azure

We have a Worker role on Windows Azure that runs ffmpeg to convert media files using MediaHandler Pro. The files that we like to process is saved on a blob storage and the resulting files should also be stored there.
Our problem is that ffmpeg works on local files and not on URIs from the blob storage. Is there any way to mount a blob storage container and access the files there directly as a file system?
If this is not possible is it ok to download the files (they can be quite large, perhaps 1-2Gb) to the local file system*, process them there and then upload them. This sounds like redundant.
*) We have set up a CloudDrive that downloads this blob to a virtual disc

You have a couple ways of doing this - you can either create a cloud drive (VHD uploaded as page blob) and mount it or you can download the source files locally and work on scratch (local temp) disk. Of the two choices, I would download locally and use scratch disk.
If you were to use a cloud drive there would be 3 primary problems - the first is that it is a VHD and you have to mount it to get the files. The second is that only 1 instance can mount for RW, so you cannot split the work of encoding source files with multiple workers saving to same drive. The 3rd problem is that it is the slowest of all the storage options. For encoding, probably not a great choice.
Your best bet is to download the source files from blob storage (that is very fast, btw) into a 'Local Resource' (aka scratch disk) and work from there. Upload the resulting file into blob storage.

If your systems support SAMBA 3.0 you can simply map the Azure Storage Blob Container as a drive using the file share features now available.

Related

Transfer failed while trying to upload file of size 300MB in Azure Storage Explorer under my Container

My transfer failed while trying to upload a file of size 300MB in Azure Storage Explorer, but when I creating a new folder under my Container, I am able to upload successfully.
I would like to understand why it worked when creating a folder, but did not work when I tried to upload directly to my Container.
Blob containers are the structures used to store blobs. An individual
Blob container can hold anywhere from zero to an infinite number of
individual blobs. By default, all blobs stored in a container share
the same level of sharing, either private or public.
I tried uploading 300 MB files to blob containers(in the same folder of the container(name of the folder>>standardmode)) in azure storage explorer in both standard(both hot and cool tier) and data lake storage which already has few files in them and uploaded successfully.
Here are images related to them:
Uploaded few files first in container named “standardmode”
Then selected  files of size 300MB to upload in the same container.
And then files got uploaded successfully.
I then tried the same for Data Lake storage and successfully uploaded files more than 300MB in one go.
In your case ,it may be some temporary issue
(Or)
You may check the version of the services .
Each block in a block blob can be a different size, up to the maximum size permitted for the service version in use.
Reference:
Understanding-Block-Blobs--Append-Blobs--and-Page-Blobs
(or)
You may try uploading files using AzCopy command.
References to use AzCopy command:
storage-use-azcopy-blobs-upload
Getting started with AzCopy

Downloading parquet files from Azure Blob Storage. File and folder with same names

I have created parquet files on Azure Blob Storage. And now I want to download them. Problem is it keeps failing. I think its because the is a file and folder with same names. Why is that? Do I just need the folder? Since the file is only 0B?
The error I get looks like:
Its saying that because it already downloaded the 0B file
As mentioned in the comments, instead of downloading the actual file, you might have downloaded the Block Blob file which is an Azure's implementation to handle and provide FileSystem like access when the blob storage is being used as a filesystem (Azure HDInsight clusters have their HDFS backed with Azure Blob Storage). Object storage like Azure Blob Storage and AWS S3 have these sort of custom implementation to provide (or simulate) a seamless filesystem like experience.
In short, don't download the Block Blob. Download the actual files.

Azure: Is there a way to cache/reuse files downloaded from Azure blob storage?

I have a file upload/download service that uploads files to Blob storage. I have another service (a job service) that needs to download files from file service (using the blob storage URLs) and process those files. The files are read-only (they are not going to change during their lifetime). In many cases, the same file can be used in different jobs. I am trying to figure out if there is a way to download a file once and all the instances of my job service use that downloaded file. So can I store the downloaded file in some shared location and access it from all the instances of my service? Does it even make sense to do it this way? Would the cost of fetching the file from blob be the same as reading it from a shared location (if that is even possible)?
Azure also provide a file storage. Azure file storage provide a facility to mount that storage as a drive and access contain of azure file storage.
Buy for this you need to download it once and then upload to file storage.
Then you can mount that to any instance of virtual machine or local drive.
That is a alternate way to achieve your goal.
Check this
http://www.ntweekly.com/?p=10034

Azure storage for files in specific folder structure

Currently i have some ftp where on it i have some deep structure of folders and files within it. It could be even 10 levels down from root folder. As i migrated already with success my local database to azure database, i wonder also whether is there any azure ftp i could use to migrate this as well. I know we have something like Azure storage and i could create Container for it of type File or Blobs - are one of those could be used like particural ftp - could i create folder structure there somehow using container and either File or Blob for that purpose, how it works there? Does either container blob or file for such purposes?
Let me add to what NDJ has written. So both Azure Blobs and Files would serve your purpose.
As mentioned by NDJ, Azure Blob Storage is a 2-level hierarchy system. At the top you have a blob container and the each blob container contains 0 or more files. So it does not support a folder structure per se but as NDJ mentioned, you can create an illusion of a sub folder by using appropriate blob delimiters (usually /). If you were to compare it with local file system, a directory at the root level (C:) is a container in blob storage and then the files would go in there. So imagine you have a folder called images in C:\ of your computer, that would be a container in blob storage. Now imagine that you have 2 sub folders beneath this folder (let's call them hires and lores) and both of them contains some files (say image1.png). When you move them to Azure Blob Storage, the container name would be images but the blob names would be hires/image1.png and lores/image1.png. Some of the storage explorers would take this delimiter (/) and show you that your container contains 2 folders and inside each folder you have an image called image1.png but in reality there are only 2 blobs in that blob container.
Azure File Service is a close match to your local file system. At the top level, you've got a Share and each share will container directories and files. Each directory can again contain many directories and files.
As NDJ mentioned, there's no FTP access to Azure Storage but there are many tools that will allow you to upload files from local computer to Azure Storage and many of them will preserve the file hierarchy. You can always write code to upload the files yourself. If you decide to use Azure Files, you can simply mount a File Storage Share as a network drive on your local computer and then transfer the files from your local computer to Azure Files as if you're transferring files from one drive to another.
UPDATE
Regarding difference between Azure Blob Storage and File Storage, both are used to store files. There are a few differences that I could think of:
A Share in Azure File Storage can be mount as a network drive on your local computer/Azure VM whereas a Blob Container in Azure Blob Storage can't. So if you have an application which writes files to local file system, you can take the application as is and make use of Azure File Storage and write the file to that network drive without making many changes to your code (typical example of Lift-And-Shift kind of application.
You can set ACL on a Blob Container whereas you can't do the same on a Share. This makes Azure Blob Storage ideal for storing static content (images, css, js) for your websites. For exposing files in File Storage, you would need to resort to Shared Access Signature.
You can set the size of a Share (default is 5GB) whereas no such thing exist for a Blob Container. A blob container can go up to the size of a storage account.
To understand Azure Files, I would recommend reading this: https://azure.microsoft.com/en-in/documentation/articles/storage-dotnet-how-to-use-files/.
Azure blob supports 10 levels down (up to 254. Basically the files are stored non hierarchically, but each / separator gives the appearance of directories.
It's relatively trivial to write something to move files to azure, as far as I know there is no ftp functionality yet - but it has been requested. It looks like some people have already created some code for this
You can now use Storage Explorer across all platforms to easily work within any folder structure.

Azure WCF accessing disk files

I have a WCF service hosted on Windows Azure as a "cloud service." When the service starts, it needs to populate data from files/disk to its memory so it is accessed fast (cached in other words). Right now I'm using like C:\Documents\Filestoprocess folder so that the WCF calls the folder and populates data data in that folder in its memory. I have like 5,000 small files. How do I do this in Azure? Is there a folder path that I can call within the WCF so that the WCF calls these files and opens each files and saves each data in the files? I'm not really looking for complicated Blob access through network using bandwidth. I'm looking for simple disk I/O access to these files from the WCF "cloud service" that is running on its own public web address.
You should try to use a cloud storage service to store data, as if you write to the local file system it can get destroyed on a restart of the service or recycling of the service.
You can look into using the azure drive service, which is like creating a disk dive. It is on top of blob storage.
But if you really want to write and read data on the local file system check out this blog post http://blog.codingoutloud.com/2011/06/12/azure-faq-can-i-write-to-the-file-system-on-windows-azure/
It talks about setting up your service definition to allow writing to the local file system.
Depending on the size of your instances you'll get a non-presistent disk where you can store this kind of temporary data. The minimum is 20GB for an extra small instance. You shouldn't access the disk directly, but you need to use a local resource instead which you can configure in your service definition file or in Visual Studio (double click your Web / Worker Role).
This storage is non-persistent, this means if you delete your deployment, if you decrease the number of instances, in case of hardware problems, ... you loose all data saved here. If you want to persist your files you should use blob storage instead. But in your case, where you need the files as some kind of caching mechanism, local resources are perfect.
And if your goal is to cache data you might want to take a look at the caching features included in Windows Azure: Caching in Windows Azure
Blob access is not complex. In fact, you could do a single download of a zip file from blob storage to local disk, unzip it, then prime your wcf service from those 5,000 small files.
Check out this msdn page documenting DownloadBlobToFile(). The essential parts:
CloudBlobClient blobClient =
new CloudBlobClient(blobEndpoint, new StorageCredentialsAccountAndKey(accountName, accountKey));
// Return a reference to the blob.
CloudBlob blob = blobClient.GetBlobReference("mycontainer/myblob.txt");
// Download the blob to a local file.
blob.DownloadToFile("c:\\mylocalblob.txt");
Now: I don't agree with saving to the root folder on C:. Rather, you should grab some local storage (easily configurable). Once you configure local storage in your role configuration, just ask the role environment for it, and ask for root path:
var localResource = RoleEnvironment.GetLocalResource("mylocalstorage");
var rootPath = localResource.RootPath;
Note: As #KingPancake mentioned, you could use an Azure drive. However: remember that an Azure drive can only be writeable by one instance. You'd need to make additional snapshots for your other instances. I think it's much simpler for you to go with a simple blob, copy your files down (either as single zip or individual files), and go from there.
You mentioned concern with network+bandwidth. You don't pay for bandwidth within the same data center. Also: It's extremely fast: 100Mbps per core. So even with a Small instance, you'll have your files copied down very quickly, moreso when you go to larger instance sizes.
One last thought: The only other ways to gain access to your 5,000 files, without using blob storage or Azure Drives (which are mounted as vhd's in blob storage) would be to either download the files from an external source or bundle them with your Windows Azure package (and then they'd show up in your app's folder, under whatever subfolder you stuck them in). Bundling has two downsides:
Longer time to upload your deployment package due to added size
Inability to change any of the individual files without redeploying the package.
By storing in a blob, you can easily change one (or all) of your small files without redeploying your code - you'd just need to signal it to either re-read from blob storage or restart the instances so they automatically download the new files.

Resources