How to ensure a blob filename is unique on Azure Storage - azure

I want to ensure the files I put on Azure storage are unique. My naive and badly performing approach is to use Java UUID to generate unique id and then check to see if the blob exists, and then write the file if not or regenerate new filename and write otherwise. This requires two round trips... is there a better way? One would hope Azure could do this.
I'm using the azure-storage-blob Java SDK 12.8.0

The Azure itself does not have this feature to do this.
Your solution should be the best one: use UUID(since UUID is globally unique, and only a very very little chance to be duplicate) as file name and then check if it exists.
Otherwise, you need to loop all the blob first->and store all the names locally, eg. store names in a list; when uploading a new file -> check the name locally from the list, then determine if it's there or not.

Related

Revisions in google cloud storage

I want to save my files to google cloud storage. I have stored my files like this name doc_personId_fileId. But now If my user uploads another file old file will be replaced. I want to keep revisions. What is best approach to keep record of all the revisions. For example:
I have a file named doc_1_1. Now if user uploads another file. Old file should be named as doc_1_1_revision_1 and after that doc_1_1_revision_2 and so on and new file should be doc_1_1.
What is best method to save this?
Or is there anything provided by google to handle this type of scenarios?
Thanks.
You want to upload doc_1_1 a few times, for example 3 times, and expect your bucket to look like:
doc_1_1
doc_1_1_revision_3
doc_1_1_revision_2
. . .
In short, you cannot achieve this automatically by GCP supports and it requires you work around your upload code to do 2 operations :
moving the old file to name it with revision
upload the new file
Alternatively, GCP support object revision using two concepts generation on the object itself and metagenerationon meta-data associated with the object. So you either keep uploading new file and do not need to pay attention to other revisions but leave it to GCP to handle. Listing files with option to see generation and metadata will give you all files and revisions
Of course, you can restore / retrieve a file with specfiying the revision
Your goal is:
I have a file named doc_1_1. Now if user uploads another file. Old
file should be named as doc_1_1_revision_1 and after that
doc_1_1_revision_2 and so on and new file should be doc_1_1.
Google Cloud Storage does not support this naming technique. You will have to implement this on the client side as part of your upload process.
Another option is to enable "Object Versioning" where previous objects with the same name still persist. The last uploaded instance is the "current" version.
This link will help you understand object versions:
Object Versioning

Azure blob upload rename if blob name exist

In Azure blob upload, a file is overwritten if you upload a new file with the same file name (in the same container).
I would like to rename the new file before saving it, to avoid overwriting any files - Is this possible?
Scenario:
Upload file "Image.jpg" to container "mycontainer"
Upload file "Image.jpg" to container "mycontainer" (with different content)
Rename second "Image.png" to "Image_{guid}.jpg" before saving it to "mycontainer".
You cannot rename a blob (there's no API for it). Your options:
check if blob name exists, prior to uploading, and choosing a different name for your about-to-be-uploaded blob if the name is already in use
simulate rename by copying existing blob to new blob of different name, then deleting original blob
As #juunas pointed out in comments: You'd have to manage your workflow to avoid potential race condition regarding checking for existence, renaming, etc.
I recommend using an "If-None-Match: *" conditional header (sometimes known as "If-Not-Exists" in the client libraries). If you include this header on either your PutBlob or PutBlockList operations, the call will fail and data will not be overwritten. You can catch this client-side and retry the upload operation (with a different blob name.)
This has two advantages over checking to see if the blob exists before uploading. First, you no longer have the potential race condition. Second, calling Exists() adds a lot of additional overhead - an additional HTTP call for every upload, which is significant unless your blobs are quite large or latency doesn't matter. With the access condition, you only need multiple calls when the name collides, which should hopefully be a rare case.
Of course, it might be easier / cleaner to just always use the GUID, then you don't have to worry about it.
Needing to rename may be indicative of an anti-pattern. If your ultimate goal is to change the name of the file when downloaded, you can do so and keep the blob name abstract and unique.
You can set the http download filename by assigning ContentDisposition property with
attachment;filename="yourfile.txt"
This will ensure that the header is set when the blob is accessed either as public or a SAS url.

Azure App service dependent calls

I have a class structure as following:
UserDepartments(1)->(n)Categories(1)->(n)Templates(1)->(n)reports
I am using Azure offline data sync with incremental sync. There are 2 major issues we are facing with this.
The code is here
Issues:
Is there any better way of downloading all this related content then doing foreach under foreach?
Intermittently we see that not all the content that has been changed on the server by another Web App downloads & syncs fine when incremental sync is on. Is there a way we can flush the cache list created by the key (the first parameter in PullAsync) used in Incremental Sync? Or do you see something we need to change in order to make sure that we download correct data on each sync?
Is there any better way of downloading all this related content then doing foreach under foreach?
Pull is performed on a per-table basis, we can’t downloading all the related content at once.
Is there a way we can flush the cache list created by the key (the first parameter in PullAsync) used in Incremental Sync?
Incremental sync is default support by PullAsync method if you pass the non-null value as the value of queryId parameter. But there are two points we need to pay attention to.
The queryId must be unique for difference pull method.
The field filter in later parameter must support sorting.

Azure Storage copy an image from blob to blob

I am using Azure Storage Nodejs and what i need to do is to copy image from one blob to another.
First i tried to getBlobToFile to get the image on temp location in disk and then just createBlockBlobFromFile from that temp location. That method did the task, but for some reason it didn't copied completely in 10% of cases.
The i was trying to use getBlobToText and the result of that put into createBlockBlobFromText, also tried to put options which is need blob to be image container. That method failed completely, image not even opened after copy.
Perhaps there is a way to copy blob file and paste it in other blobl but i didn't find that method.
What else can i do?
I'm not sure what your particular copy-error is, but... with getLocalBlobToFile(), you're actually physically moving blob content from blob storage to your VM (or local machine), and then with createBlockBlobFromLocalFile() you're pushing the entire contents back to blob storage, which is resulting in two physical network moves.
The Azure Storage system supports blob-copy as a 1st-class operation. While it's available via REST API call, it's also wrapped in the same SDK you're using, in the method BlobService.startCopyBlob() (source code here). This will instruct the storage to initiate an async copy operation, completely within the storage system (meaning no download+upload on your side). You'll be able to set source and destination, set timeouts, etc. (all parameters are fully documented in the source code).
The link in the accepted answer is broken, although the method is correct: the method startCopyBlob is documented here
(Updated: Jan 3, 2020) https://learn.microsoft.com/en-us/javascript/api/azure-storage/BlobService?view=azure-node-latest#azure_storage_BlobService_createBlockBlobFromLocalFile
(The old link) https://learn.microsoft.com/en-us/javascript/api/azure-storage/BlobService?view=azure-node-latest#azure_storage_BlobService_createBlockBlobFromLocalFile

Avoid over-writing blobs AZURE

if i upload a file on azure blob in the same container where the file is existing already, it is over-writing the file, how to avoid overwriting the same? below i am mentioning the scenario...
step1 - upload file "abc.jpg" on azure in container called say "filecontainer"
step2 - once it gets uploaded, try uploading some different file with the same name to the same container
Output - it will overwrite existing file with the latest uploaded
My Requirement - i want to avoid this overwrite, as different people may upload files having same name to my container.
Please help
P.S.
-i do not want to create different containers for different users
-i am using REST API with Java
Windows Azure Blob Storage supports conditional headers using which you can prevent overwriting of blobs. You can read more about conditional headers here: http://msdn.microsoft.com/en-us/library/windowsazure/dd179371.aspx.
Since you want that a blob should not be overwritten, you would need to specify If-None-Match conditional header and set it's value to *. This would cause the upload operation to fail with Precondition Failed (412) error.
Other idea would be to check for blob's existence just before uploading (by fetching it's properties) however I would not recommend this approach as it may lead to some concurrency issues.
You have no control over the name your users upload their files with. You, however, have control over the name you store those files with. The standard way is to generate a Guid and name each file accordingly. The chances of conflict is almost zero.
A simple pseudocode looks like this:
//generate a Guid and rename the file the user uploaded with the generated Guid
//store the name of the file in a dbase or what-have-you with the Guid
//upload the file to the blob storage using the name you generated above
Hope that helps.
Let me put it that way:
step one - user X uploads file "abc1.jpg" and you save it io a local folder XYZ
step two - user Y uploads another file with same name "abc1.jpg", and now you save it again in a local folder XYZ
What do you do now?
With this I am illustrating that your question does not relate to Azure in any way!
Just do not rely on original file names when saving files. Where-ever you are saving them. Generate random names (GUIDs for example) and "attach" the original name as meta-data.

Resources