Deep link to text file in Azure Data Lake Store - azure

I am trying to quickly access text files via URL. The Azure portal (http://portal.azure.com) can (at best) link to the explore view of a specific folder, but I have not found any way to deep link into a specific file.
I also tried Azure Storage Explorer, which does support adl:// URLs but (apart from opening slowly) it only browses to the folder and it doesn't actually open it.
My use case is that at the end of each data processing job, I want to print a URL to open a text file for browsing.
Any ideas or workarounds?

In fact , there's no anonymous access allowed for files stored in ADLS. The access needs to be authorized so that we can't open it via the url directly.
Based on your situation, I suggest you creating your own endpoint (For example: Azure Function) as proxy to access resources with being authorized. You could access Azure Function with the url of the file you want to open as parameter.Then make the request to get the content of the file to display for browsing.
In addition, considering the security of accessing files , you need to focus on the Access control in Azure Data Lake Store.
Hope it helps you.

Related

Azure storage options to serve content on Azure Web App

I am a total newbie to Azure WebApps and storage, I need some clarification/confirmation. The main thing to take note of, my application (described below) requires a folder hierarchy. Blob is out of the question and file share doesn't allow anonymous access unless I use Shared Access Signature (SAS).
Am I understanding Azure storage correctly, it's either you fit into the Azure storage model or you don't?
Can anyone advise how I can achieve what's required by the CMS application as described below by using Blobs?
The only option I see is to find a way to change the CMS application so that it always has the SAS in the URL to every file it requests from storage in order to serve content on my Web App? If so, is it a problem if I set my SAS to expire sometime in the distant future?
https://<appname>.file.core.windows.net/instance1/site1/file1.jpg?<SAS>
Problem with using Blob
So far my understanding is that Blob storage doesn't allow "sub folders" as it's a container that holds unstructured data, therefore I'm unable to use this based on my application (described below) as it requires folder structure.
The problem with using File Share
File share seemed perfect as it allows for folder hierarchy, naturally that's what I've used.
However, no anonymous access is allowed for files stored in file storage, the access needs to be authorised. One way of authorising the access is to create a SAS on a file/share level with Read permission and then using that SAS URL to access the file.
Cannot access Windows azure file storage document
My application
I've created a Linux Web App running open source CMS application. This application allows creation of multiple websites, for each website's content such as images, docs, multimedia to be stored on a file server. These files are then served to the website via a defined URL.
The CMS application allows for a settings of the location where it should save its files, this would be a folder on the file server. It then creates a new sub folder for every site it hosts in that location.
Example folder hierarchy
/instance1
/site1
/file1
/file2
/site2
/file1
/file2
Am I understanding Azure storage correctly, it's either you fit into
the Azure storage model or you don't?
You can use Azure Storage Model for your CMS Application. You can use either Blob Storage or File Share
Can anyone advise how I can achieve what's required by the CMS
application as described below by using Blobs?
You can use Data Lake Gen 2 storage account if you want to use Azure Blob Storage.
Data Lake Gen 2 storage enables hierarchical namespace so that you can use subfolders in the Blob Storage as per your requirements
Problem with using Blob
Blob Storage allows subfolders if we use Data Lake Gen 2 storage account. You can enable Blob Public Anonymous access
The problem with using File Share
Azure File Share supports but does not allow public anonymous access. You can use Azure Managed Identity (System-Assigned) for your web app to access the Azure File Share.
Then your application would be able to access the Azure File Share without SAS token
The issue of not having real folders in a blob storage shouldn't be any issue for your use case. Just because it doesn't have your traditional folders doesn't mean it can't serve content on e.g. instance1/site1/file1. That's still possible but the instance1/site1/ will just be part of the name of the blob.
Tools like the Azure Portal or Storage Explorer will actually show folders by using the delimiter / and querying data that appears to be inside a folder by using the path as prefix.

Kentico Azure blob integration

On my Kentico project I have integrated Azure blob storage instead of saving files locally. Followed this article. https://docs.kentico.com/k12/custom-development/working-with-physical-files-using-the-api/configuring-file-system-providers/configuring-azure-storage
Things are working alright except for a one problem. Now all the files are accessible publically. There are some PDF files in the media library that I won't only the logged in users to view but now any one can view these files. Is there any workaround for this issue?
Files in Media Library are always accessible via the direct link and you can't restrict them to logged-in users only. Regardless it's Azure storage or local disk.
But there are two ways of achieving this:
Presentation-only restrictions. When you present those PDF links to the website user - display them only to logged-in users. The files will still be accessible via direct links but only logged-in users will see them.
Hard restrictions. As far as I know, these restrictions can be set up only for files stored in CMS tree. This approach will check permissions when accessing files via direct link.
If you are storing files in blob there is no way. You can restrict the access to the whole container with SAS token (or individual blob), but not a to a specific folder. Folder is purely virtual structure, it exists only in a file path.

What is the best strategy for using Windows Azure as a file storage system - with http download capabilities

I need to store multiple files that users upload, and then provide these users with the capability of accessing their files via http. There are two key considerations:
- Storage (which is my primary concern here)
- Security (which let's leave aside for now)
The question is:
What is the most cost efficient and performant way of storing all these files and giving access to them later? I believe the answer is:
- Store files within Azure Storage Account, and have a key that references them in an SQL Azure database.
I am correct on this?
Is a blob storage flat? Or can I create something like folders inside it to better organize my files?
The idea of using SQL Azure to store metadata for your blobs is a pretty common scenario, which allows you to take advantage of SQL for searching, and blobs for storage.
Blobs are organized by container. So you'd have something like:
http://mystorage.blob.core.windows.net/mycontainer/myfile.doc
You can also simulate a hierarchy using a delimiter, but in reality there's just container plus blob.
If you keep the container or blob private, the user would either have to go through your web front end (or web service), or you'd have to provide them with a special URL with a Shared Access Signature appended, which is a time-limited URL.
I would recommend you to take a look at BlobShare Sample which is a simple file sharing application that demonstrates the storage services of the Windows Azure Platform, together with the authentication and authorization capabilities of Access Control Service (ACS). The full sample code is located at following link:
http://blobshare.codeplex.com/
You can use this sample code immediately, just by adding proper reference to your Windows Azure Account credentials. The best thing with this sample is that you can provide blob access directly through Access Control Services. You can also modify the code to add SAS support as well as blob download from public containers. Once you have it working and understood the concept you can tweak to make it the way you would want.

Windows azure requests

I have an application that is deployed on Windows Azure, in the application there is a Report part, the reports works as shown below.
The application generates the report as a PDF file and save it in a certain folder in the application.
I have a PDF viewer in the application that takes the URL of the file and displays it.
As you know, in windows azure I will have several VMs that will handled through a Load balancer so I can not ensure that the request in step 2 will go to the same VM in step 1, and this will cause a problem for me.
Any help is very appreciated.
I know that I can use BLOB, but this is not the problem.
The problem is that after creating the file on a certain VM, I give the PDF viewer the url of the pdf viewer as "http://..../file.pdf". This will generate a new request that I cannot control, and I cannot know which VM will server, so even I saved the file in the BLOB it will not solve my problem.
as in any farm environment, you have to consider saving files in a storage that is common for all machines in the farm. In Windows Azure, such common storage is Windows Azure Blob Storage.
You have to make some changes to your application, so that it saves the files to a Blob stroage. If these are public files, then you just mark the Blob Container as public and provide the full URL to the file in blob to the PDF viewer.
If your PDF files are private, you have to mark your container as private. Second step is to generate a Shared Access Signature URL for the PDF and provide that URL to the PDF viewer.
Furthermore, while developing you can explore your Azure storage using any of the (freely and not so freely) available tools for Windows Azure Storage. Here are some:
Azure Storage Explorer
Azure Cloud Storage Studio
There are a lot of samples how to upload file to Azure Storage. Just search it with your favorite search engine. Check out these resources:
http://msdn.microsoft.com/en-us/library/windowsazure/ee772820.aspx
http://blogs.msdn.com/b/windowsazurestorage/archive/2010/04/11/using-windows-azure-page-blobs-and-how-to-efficiently-upload-and-download-page-blobs.aspx
http://wely-lau.net/2012/01/10/uploading-file-securely-to-windows-azure-blob-storage-with-shared-access-signature-via-rest-api/
The Windows Azure Training Kit has great lab named "Exploring Windows Azure Storage"
Hope this helps!
UPDATE (following question update)
The problem is that after creating the file on a certain VM, I give
the PDF viewer the url of the pdf viewer as "http://..../file.pdf".
This will generate a new request that I cannot control, and I cannot
know which VM will server, so even I saved the file in the BLOB it
will not solve
Try changing a bit your logic, and follow my instructions. When your VM create the PDF, upload the file to a blob. Then give the full blob URL for your pdf file to the PDF viewer. Thus the request will not go to any VM, but just to the blob. And the full blob URL will be something like http://youraccount.blob.core.windows.net/public_files/file.pdf
Or I am missing something? What I understand, your process flow is as follows:
User makes a special request which would cause PDF file generation
File is generated on the server
full URL to the file is sent back to the client so that a client PDF viewer could render it
If this is the flow, that with suggested changes will look like the following:
User make a special request which would cause PDF file generation
File is generated on the server
File is uploaded to a BLOB storage
Full URL for the file in the BLOB is returned back to the client, so that it can be rendered on the client.
What is not clear? Or what is different in your process flow? I do exaclty the same for on-the-fly report generation and it works quite well. The only difference is that my app is Silverlight based and I force file download instead of inline-displaying.
An alternative approach is not to persist the file at all.
Rather, generate it in memory, set the content type of the response to "application/pdf" and return the binary content of the report. This is particularly easy if you're using ASP.NET MVC, but you can use a HttpHandler instead. It is a technique I regularly use in similar circumstances (though lately with Excel reports rather than PDF).
The usefulness of this approach does depend on how you're generating the PDF, how big it is and what the load is on your application.
But if the report is to be served just once, persisting it just so that another request can be made by the browser to retrieve it is just wasteful (and you have to provide the persistence mechanism).
If the same file is to be served multiple times and it is resource-intensive to create, it makes sense to persist it, then.
You want to save your PDF to a centralized persisted storage. VM's hard drive is neither. Azure Blob Storage is likely the simplest and best solution. It is dirt cheap to store and access. API for storing files and access them is very simple
There are two things you could consider.
Windows Azure Blob + Queue Storage
Blob Storage is a cost effective way of storing binary and sharing that information between instances. You would most likely use a worker role to create the Report which would store the report to Blob Storage and drop a completed message on the Queue.
Your web role instance could monitor the queue looking for reports that are ready to be displayed.
It would be similar to the concept used in the Windows Azure Guest Book app.
Windows Azure Caching Service
Similarly [and much more expensive] you could share the binary using the Caching Service. This gives a common layer between your VMs in which to store things, however you won't be able to provide a url to the PDF you'd have to download the binary and use either an HttpHandler or change the content-type of the request.
This would be much harder to implement, very expensive to run, and is not guaranteed to work in your scenario. I'd still suggest Blobs over any other means
Another option would be to implement a sticky session handler of your own. Take a look at:
http://dunnry.com/blog/2010/10/14/StickyHTTPSessionRoutingInWindowsAzure.aspx

Allowing access to Azure Storage nodes to select users?

Given a stored file on Azure Storage (blobs, tables or queues -- doesn't matter), is it possible to allow access to it for all, but only based on permissions?
For example, I have a big storage of images, and a DB containing users and authorizations. I want user X to only be able to access images Y and Z. So, the URL will be generally inaccessible, unless you provide some sort of a temporary security token along with it. How's that possible?
I know I can shut the storage from the outside world, and allow access to it only through an application checking this stuff, but this would require the application to be on Azure as well, and on-premise app won't be able to deliver any content from Azure Storage.
It is from my understanding that most CDNs provide such capability, and I sure hope so Azure provides a solution for this as well!
Itamar.
I don't think you can achieve this level of access filtering. The only methods I'm aware of are described in this msdn article
Managing Access to Containers and Blobs
and here a blog that describes a little part of code to implement it
Using Container-Level Access Policies in Windows Azure Storage
I'm not sure this would fit your need. If I understood it right I would do it this way :
1. Organize your content in container that match the roles
2. On your on premise application check if user has access and if yes generate the right URL to give him a temporary access to the resource.
Of course this only works if the users have to go through a central point to get access to the content in the blob. If they bookmark the generated link it will fail once the expiration date is passed.
Good luck.
This is actually possible to implement with Blob storage. Consider (a) a UI that is like explorer, and (b) that users are already authenticated (could use Access Control Service, but don't need to).
The explorer-like UI could expose resources that are appropriate to the authenticated user. The underlying access to these resources would be Shared Access Signature-controlled at the granularity appropriate for the objects - for example, restrict only see one file in a folder, or the whole folder, or ability to create a file in a folder, etc., can all be expressed.
This explorer-like UI but would need access to logic that would present the right files for a given user, while also creating the appropriate Shared-Access-Signatures as needed. Note that this logic would not need to be hosted in Azure, rather would just need access to the proper storage key (from the Azure portal).

Resources