Avoid over-writing blobs AZURE - azure

if i upload a file on azure blob in the same container where the file is existing already, it is over-writing the file, how to avoid overwriting the same? below i am mentioning the scenario...
step1 - upload file "abc.jpg" on azure in container called say "filecontainer"
step2 - once it gets uploaded, try uploading some different file with the same name to the same container
Output - it will overwrite existing file with the latest uploaded
My Requirement - i want to avoid this overwrite, as different people may upload files having same name to my container.
Please help
P.S.
-i do not want to create different containers for different users
-i am using REST API with Java

Windows Azure Blob Storage supports conditional headers using which you can prevent overwriting of blobs. You can read more about conditional headers here: http://msdn.microsoft.com/en-us/library/windowsazure/dd179371.aspx.
Since you want that a blob should not be overwritten, you would need to specify If-None-Match conditional header and set it's value to *. This would cause the upload operation to fail with Precondition Failed (412) error.
Other idea would be to check for blob's existence just before uploading (by fetching it's properties) however I would not recommend this approach as it may lead to some concurrency issues.

You have no control over the name your users upload their files with. You, however, have control over the name you store those files with. The standard way is to generate a Guid and name each file accordingly. The chances of conflict is almost zero.
A simple pseudocode looks like this:
//generate a Guid and rename the file the user uploaded with the generated Guid
//store the name of the file in a dbase or what-have-you with the Guid
//upload the file to the blob storage using the name you generated above
Hope that helps.

Let me put it that way:
step one - user X uploads file "abc1.jpg" and you save it io a local folder XYZ
step two - user Y uploads another file with same name "abc1.jpg", and now you save it again in a local folder XYZ
What do you do now?
With this I am illustrating that your question does not relate to Azure in any way!
Just do not rely on original file names when saving files. Where-ever you are saving them. Generate random names (GUIDs for example) and "attach" the original name as meta-data.

Related

How to ensure a blob filename is unique on Azure Storage

I want to ensure the files I put on Azure storage are unique. My naive and badly performing approach is to use Java UUID to generate unique id and then check to see if the blob exists, and then write the file if not or regenerate new filename and write otherwise. This requires two round trips... is there a better way? One would hope Azure could do this.
I'm using the azure-storage-blob Java SDK 12.8.0
The Azure itself does not have this feature to do this.
Your solution should be the best one: use UUID(since UUID is globally unique, and only a very very little chance to be duplicate) as file name and then check if it exists.
Otherwise, you need to loop all the blob first->and store all the names locally, eg. store names in a list; when uploading a new file -> check the name locally from the list, then determine if it's there or not.

How can I attach all content files from folders in my blob container with a Logic App?

I have a blob container "image-blob", and I create a folder blob with OCR image text and the image (two files, image.txt (with the text of an image) and image.png). The container have multiple folders, and inside each folder both files. How can I make a Logic App in which it sends an email with both files of every folder? (this would be an email for each folder with 2 files). The name of the folder is generated randomly and every file has the name of the folder + extension.
I've tried making a condition and if isFolder() method, but nothing happens.
This is how my container looks like:
This is files each folder have:
You could try with List blobs in root folder if your folders are in the root of the container or if not you could use List blobs.
If you try List blobs in root folder, your flow would be like the below pic shows. After List blobs you will get all blob info and you could add action like Get blob content using path.
And if you use List blobs, only the first step is different. And you need specify the container path. The other steps just like the List blobs in root folder.
In my test, I add the get blob content using path action and here is the result.
It did get all blob , however due to the For each action, you could only get one by one, so in your situation, you maybe need to store the info you need into a file then get the whole information sheet from the file.
Hope this could help you, if you still have other questions, please let me know.
How can I make a Logic App in which it sends an email with both files of every folder?
It's hard to put two files in an email. The following snapshot shows that send an email with each files of every folder.
If you still have any problem, please feel free to let me know.

Revisions in google cloud storage

I want to save my files to google cloud storage. I have stored my files like this name doc_personId_fileId. But now If my user uploads another file old file will be replaced. I want to keep revisions. What is best approach to keep record of all the revisions. For example:
I have a file named doc_1_1. Now if user uploads another file. Old file should be named as doc_1_1_revision_1 and after that doc_1_1_revision_2 and so on and new file should be doc_1_1.
What is best method to save this?
Or is there anything provided by google to handle this type of scenarios?
Thanks.
You want to upload doc_1_1 a few times, for example 3 times, and expect your bucket to look like:
doc_1_1
doc_1_1_revision_3
doc_1_1_revision_2
. . .
In short, you cannot achieve this automatically by GCP supports and it requires you work around your upload code to do 2 operations :
moving the old file to name it with revision
upload the new file
Alternatively, GCP support object revision using two concepts generation on the object itself and metagenerationon meta-data associated with the object. So you either keep uploading new file and do not need to pay attention to other revisions but leave it to GCP to handle. Listing files with option to see generation and metadata will give you all files and revisions
Of course, you can restore / retrieve a file with specfiying the revision
Your goal is:
I have a file named doc_1_1. Now if user uploads another file. Old
file should be named as doc_1_1_revision_1 and after that
doc_1_1_revision_2 and so on and new file should be doc_1_1.
Google Cloud Storage does not support this naming technique. You will have to implement this on the client side as part of your upload process.
Another option is to enable "Object Versioning" where previous objects with the same name still persist. The last uploaded instance is the "current" version.
This link will help you understand object versions:
Object Versioning

Azure blob upload rename if blob name exist

In Azure blob upload, a file is overwritten if you upload a new file with the same file name (in the same container).
I would like to rename the new file before saving it, to avoid overwriting any files - Is this possible?
Scenario:
Upload file "Image.jpg" to container "mycontainer"
Upload file "Image.jpg" to container "mycontainer" (with different content)
Rename second "Image.png" to "Image_{guid}.jpg" before saving it to "mycontainer".
You cannot rename a blob (there's no API for it). Your options:
check if blob name exists, prior to uploading, and choosing a different name for your about-to-be-uploaded blob if the name is already in use
simulate rename by copying existing blob to new blob of different name, then deleting original blob
As #juunas pointed out in comments: You'd have to manage your workflow to avoid potential race condition regarding checking for existence, renaming, etc.
I recommend using an "If-None-Match: *" conditional header (sometimes known as "If-Not-Exists" in the client libraries). If you include this header on either your PutBlob or PutBlockList operations, the call will fail and data will not be overwritten. You can catch this client-side and retry the upload operation (with a different blob name.)
This has two advantages over checking to see if the blob exists before uploading. First, you no longer have the potential race condition. Second, calling Exists() adds a lot of additional overhead - an additional HTTP call for every upload, which is significant unless your blobs are quite large or latency doesn't matter. With the access condition, you only need multiple calls when the name collides, which should hopefully be a rare case.
Of course, it might be easier / cleaner to just always use the GUID, then you don't have to worry about it.
Needing to rename may be indicative of an anti-pattern. If your ultimate goal is to change the name of the file when downloaded, you can do so and keep the blob name abstract and unique.
You can set the http download filename by assigning ContentDisposition property with
attachment;filename="yourfile.txt"
This will ensure that the header is set when the blob is accessed either as public or a SAS url.

Amazon S3 Browser Based Upload - Prevent Overwrites

We are using Amazon S3 for images on our website and users upload the images/files directly to S3 through our website. In our policy file we ensure it "begins-with" "upload/". Anyone is able to see the full urls of these images since they are publicly readable images after they are uploaded. Could a hacker come in and use the policy data in the javascript and the url of the image to overwrite these images with their data? I see no way to prevent overwrites after uploading once. The only solution I've seen is to copy/rename the file to a folder that is not publicly writeable but that requires downloading the image then uploading it again to S3 (since Amazon can't really rename in place)
If I understood you correctly The images are uploaded to Amazon S3 storage via your server application.
So the Amazon S3 write permission has only your application. Clients can upload images only throw your application (which will store them on S3). Hacker can only force your application to upload image with same name and rewrite the original one.
How do you handle the situation when user upload a image with a name that already exists in your S3 storage?
Consider following actions:
First user upload a image some-name.jpg
Your app stores that image in S3 under name upload-some-name.jpg
Second user upload a image some-name.jpg
Will your application overwrite the original one stored in S3?
I think the question implies the content goes directly through to S3 from the browser, using a policy file supplied by the server. If that policy file has set an expiration, for example, one day in the future, then the policy becomes invalid after that. Additionally, you can set a starts-with condition on the writeable path.
So the only way a hacker could use your policy files to maliciously overwrite files is to get a new policy file, and then overwrite files only in the path specified. But by that point, you will have had the chance to refuse to provide the policy file, since I assume that is something that happens after authenticating your users.
So in short, I don't see a danger here if you are handing out properly constructed policy files and authenticating users before doing so. No need for making copies of stuff.
actually S3 does have a copy feature that works great
Copying Amazon S3 Objects
but as amra stated above, doubling your space by copying sounds inefficient
mybe itll be better to give the object some kind of unique id like a guid and set additional user metadata that begin with "x-amz-meta-" for some more information on the object, like the user that uploaded it, display name, etc...
on the other hand you could always check if the key exists already and prompt for an error

Resources