Azure blob storage: Shared access signature for multiple containers? - azure

I'm creating an application that will be hosted in Azure. In this application, users will be able to upload their own content. They will also be able to configure a list of other trusted app users who will be able to read their files. I'm trying to figure out how to architect the storage.
I think that I'll create a storage container named after each user's application ID, and they will be able to upload files there. My question relates to how to grant read access to all files to which a user should have access. I've been reading about shared access signatures and they seem like they could be a great fit for what I'm trying to achieve. But, I'm evaluating the most efficient way to grant access to users. I think that Stored access policies might be useful. But specifically:
Can I use one shared access signature (or stored access policy) to grant a user access to multiple containers? I've found one piece of information which I think is very relevant:
http://msdn.microsoft.com/en-us/library/windowsazure/ee393341.aspx
"A container, queue, or table can include up to 5 stored access policies. Each policy can be used by any number of shared access signatures."
But I'm not sure if I'm understanding that correctly. If a user is connected to 20 other people, can I grant him or her access to twenty specific containers? Of course, I could generate twenty individual stored access policies, but that doesn't seem very efficient, and when they first log in, I plan to show a summary of content from all of their other trusted app users, which would equate to demanding 20 signatures at once (if I understand correctly).
Thanks for any suggestions...
-Ben

Since you are going to have a container per user (for now I'll equate a user with what you called a user application ID), that means you'll have a storage account that can contain many different containers for many users. If you want to have the application have the ability to upload to only one specific container while reading from many two options come to mind.
First: Create a API that lives somewhere that handles all the requests. Behind the API your code will have full access to entire storage account so your business logic will determine what they do and do not have access to. The upside of this is that you don't have to create Shared Access Signatures (SAS) at all. Your app only knows how to talk to the API. You can even combine the data that they can see in that summary of content by doing parallel calls to get contents from the various containers from a single call from the application. The downside is that you are now hosting this API service which has to broker ALL of these calls. You'd still need the API service to generate SAS if you go that route, but it would only be needed to generate the SAS and the client applications would make the calls directly with the Windows Azure storage service bearing the load which will reduce the resources you actually need.
Second: Go the SAS route and generate SAS as needed, but this will get a bit tricky.
You can only create up to five Stored Access Policies on each container. For one of these five you create one policy for the "owner" of the container which gives them Read and write permissions. Now, since you are allowing folks to give read permissions to other folks you'll run into the policy count limit unless you reuse the same policy for Read, but then you won't be able to revoke it if the user removes someone from their "trusted" list of readers. For example, if I gave permissions to both Bob and James to my container and they are both handed a copy of the Read SAS, if I needed to remove Bob I'd have to cancel the Read Policy they shared and reissue a new Read SAS to James. That's not really that bad of an issue though as the app can detect when it no longer has permissions and ask for the renewed SAS.
In any case you still kind of want the policies to be short lived. If I removed Bob from my trusted readers I'd pretty much want him cut off immediately. This means you'll be going back to get a renewed SAS quite a bit and recreating the signed access signature which reduces the usefulness of the signed access policies. This really depends on your stomach of how long you were planning on allowing the policy to live and how quickly you'd want someone cut off if they were "untrusted".
Now, a better option could be that you create Ad-hoc signatures. You can have as many Ad-hoc signatures as you want actually, but they can't be revoked and can at most last one hour. Since you'd make them short lived the length or lack of revocation shouldn't be an issue. Going that route will mean that you'd be having the application come back to get them as needed, but given what I mentioned above about when someone is removed and you want the SAS to run out this may not be a big deal. As you pointed out though, this does increase the complexity of things because you're generating a lot of SASs; however, with these being ad-hoc you don't really need to track them.
If you were going to go the SAS route I'd suggest that your API be generating the ad-hoc ones as needed. They shouldn't last more than a few minutes as people can have their permissions to a container removed and all you are trying to do is reduce the load on hosted service for actually doing the upload and download. Again, all the logic for handling what containers someone can see is still in your API service and the applications just get signatures they can use for small periods of time.

Related

Azure Blob Storage file level security

i have an Azure Blob Storage with blobs that are pdf that are categorized by client number. So for each client, they have multiple pdf reports. I only want the client to be able to access the blobs for their client number. (There are hundreds of clients.)
I've researched, but only see shared access signatures, but this doesn't look like what i need.
There is no user-level blob permissions, other than Shared Access Signatures (and Policies).
It's going to be up to you to manage access to specific user content (and how you manage that is really up to you and your app, and how you manage a user's content metadata).
When providing a link to a user's content: if you assume all content is always private, then simply create an on-demand SAS link when requested. There's no way for the user to modify a SAS link to guess sequential numbers or neighboring blobs, since the SAS is for a specific URL.
As Andrés suggested, you could also use your app to stream blob content, and never worry about SAS. However, you will now be consuming resources of your web app (network, CPU, memory), and this will have an impact on your app's scale requirements. You will no longer be able to offload this to the storage service.
Sounds like you already have the users authenticate, and you know which pdfs belong to them. My suggestion is to add to your current application a simple proxy (for instance if you have an MVC application, you could add a new controller and action method that will retrieve the pdfs on behalf of the user).
This way you don't need to use shared access signature and can keep the blob container private. Your controller/action method would simply use the storage SDK to retrieve the blob. An added bonus is that you could check to make sure that they are requesting their own PDF file and reject the request if they guess the ID of someone else's file.

Can Application Insights merge Azure.Mobile.Server.Files activity with Azure Storage blob diagnostic logs?

When a client calls GetStorageTokenAsync on the server, it gets a token that can read, write, or delete objects on the target container.
The activity done on this container is more or less hidden from my application unless I scan the logs.
Therefore I'm left to guess, or do some cumbersome programming to determine what the files where, the content of the files, determine what changed, etc...
I want to gather empirical evidence of what a given userID did with a certain known token Shared Access Signature, and aggregate the into either an administrative console like Application Insights, or some other tool that will allow programmatic response to the user's actions.
Question
What is the best way to align the actions a user takes, with a given Shared Access Signature (specifically in the context of the Nuget package Microsoft.Azure.Mobile.Server.Files?
and aggregate the into either an administrative console like Application Insights, or some other tool that will allow programmatic response to the user's actions.
There is not a tool or service which support it until now. I am afraid that you need to develop this tool by yourself. Storage Analytics Log stored in the $logs container of your storage account. All the logs are written to a text file line by line.
What is the best way to align the actions a user takes, with a given Shared Access Signature
Based on the Storage Analytics Log Format, we can only get whether a storage operation is authenticated by Shared Access Signature or account key by 'authentication-type' (If the operation is authenticated by Shared Access Signature, the value of authentication-type will be ‘sas’). We can’t get the operations which related to a given Shared Access Signature.
Azure Storage Analytics log also contains the requester-ip-address which could identity the client which send the storage operate request. If the IP address of client is static, it will help you get all the actions which the specific user taked.

Windows Azure - Shared Access Signature (SAS URI)

Heres three questions for you!
Is it possible to revoke an active SAS URI without refreshing storage key or using Stored Access Policy?
In my application, all users share the same blob container. Because of this, using stored access policy, (max 5 per container), or refreshing storage key, (will result in ALL SAS URI'S being deleted), is not an option for me.
Is it possible to show custom errors if the SAS URI is incorrect or expired?
This is the default page:
If I let users create their own SAS URI for uploading/downloading, do I need to think about setting restrictions? Can this be abused?
Currently, in my application, there are restrictions on how much you are allowed to upload, but no restrictions on how many SAS URIS you are allowed to create. Users can aquire how many SAS URIS as they like as long as the don't complete their upload or exceed the allowed stored bytes.
How does real filesharing websites deal with this?
How much does a SAS URI cost to create?
Edit - Clarification of question 3.
Before you can upload or download a blob you must first get the SAS URI. I was wondering if it's "expensive" to create a SAS URI. Imagine a user exploiting this, creating a SAS URI over and over again without finishing the upload/download.
I was also wondering how real filesharing websites deal with this. It's easy to store information about how much storage the user is using and with that information put restrictions etc, but... If a user keeps uploading files to 99% and then cancel and restarts again and do the same thing, i imagine it would cost alot for the host
To answer your questions:
No, ad-hoc SAS tokens (i.e. tokens without Storage Access Policy) can't be revoked other than changing the storage key or access policy.
No, at this time it is not possible to customize error message. Standard error returned by storage service will be shown.
You need to provide more details regarding 3. As it stands, I don't think we have enough information to comment.
UPDATE
Regarding your question about how expensive creating a SAS URI is, one thing is that creating a SAS URI does not involve making a REST API call to storage service so there's no storage transaction involved. So from the storage side, there's no cost involved in creating a SAS URI. Assuming your service is a web application, only cost I could think of is user making call to your service to create a SAS URI.
Regarding your comment about how real file sharing websites deal with it, I think unless someone with a file sharing website answers it, it would be purely speculative.
(My Speculative response :)) If I were running a file sharing website, I would not worry too much about this kind of thing simply because folks don't have time to "mess around" with your site/application. It's not that the users would come to your website with an intention of "let's just upload files till the upload is 99%, cancel the upload and do that again" :). But again, it is purely a speculative response :).

Better Understanding of Purpose of Signed Identifiers in Blob Storage (Azure)

I am trying to get an understanding of the purpose of signed identifiers when using Shared Access Signatures with Blob storage in Azure. I know that signed identifiers are basically applied at container level and are a named. Furthermore, I know that they provide any Shared Access Policies to be valid for longer than an hour (as opposed to when not specifying a signed identifier). I guess my question is couldn't you just apply a shared access signature at the container level with appropriate permissions and expiry time? Thanks to all that reply.
Okay, I think I get now. So best way to interpret SI's are that they are another level of abstraction for access control at the container level. Furthermore, they allow you to specify how long policies can be applied before they are revoked. In both explicit and SI declaration, revocation is pretty much the expiry time.
So my next question is say for instance I have a policy that has been compromised. How exactly do I immediately revoke or change the policy (being that I've defined this policy in my code; how would I change it without having redeploy code)?
The Signed Identifier is how you reference an ACL on a particular Container. These are required for you to create revocable access to your blobs.
If you create an Expiry Time longer than one hour the Blob Service could possibly return a 400 Bad Request Error, or simply ignore the expiry time and set it to 1 hour.
This is done as part of the platform to ensure that your data is secure.
There is more information about the lifetime of a SAS in the MSDN Library
The main reason for the signed identifier as opposed to explicitly specifying all parameters has to do with security. If for some reason a SAS was created that had all the parameters specified and had a valid HMAC signature, the blob service would honor it. Imagine there was no limit to expiry time. Now, imagine it leaks. In the normal case, it can only do damage for up to an hour. Rememeber, you have specified all the parameters in it, so you cannot change it. If you could specify an unlimited time, it could not be revoked without actually changing your main storage key (that would invalidate the sig and break all existing SAS). The SI gives you one more layer of abstraction to prevent having to roll storage keys.
The signed identifier (or policy as I like to call them) is the way to extend past an hour and still be able to a.) immediately revoke if necessary or b.) immediately change. With the SI, you can change the permissions, you can delete it, you can change the expiry, all of which give you greater control over the life and access of your existing SAS (the ones that use SI anyway).
Actually I've just answered my own question. I can write code to reference the containers in question and clear out the access policies currently set any container.

Where do you store your db password and get it in your J2EE app?

How do you make sure it is secure when there are some devs who can access the machine?
Baring the whole discussion about not storing passwords in files you use the machine's own ACL to prevent them from accessing it.
Make the file readable only by the admin account, or some other account used to run your software. Then you dont give the developers the admin account/process account information.
The bigger question is, if you are concerned about them accessing the file on your machine, why do they have access to said machine? Any developer that is able to replace the code on the server without checks will be able to access your database.
Lets give a nice real world example of why you would want to do something like this.
You hire developers to create a Bank of Stackoverflow website. For whatever reason you store all your clients account information, including SSN, in a single database that needs to be accessed by the Bank of Stackoverflow website.
Do not give developers permission to put code directly onto a live machine
Do not give developers the access information to the database.
All code has to go onto a stage machine to be verified. For the most part it is easy enough to allow developers to use stage databases consisting of fake client information.
It is the responsibility of vetted engineers, to move products from the staged machine to the production machine.
I did not completely understand your problem but I think following article is for you:
Data Storage Security in J2EE

Resources