Azure block blob uncommited update behavior

Azure block blob uncommited update behavior - azure

I am writing a project that needs concurrent update to the block blob. From Microsoft documentation:
Uncommitted Block List: The list of blocks that have been uploaded for a blob using Put Block, but that have not yet been committed. These blocks are stored in Azure in association with a blob, but do not yet form part of the blob.
I could not find any documentation on
Whether an update on uncommitted blob could/should be performed
What happens when you write to an uncommitted blob.
How long is the time bound on a block blob to go from uncommitted -> committed based on the consistency policy you choose.
Could someone provide more context on the concurrent update behavior of uncommitted block blobs?

I tested it with fiddler, and the latest blob storage nuget package Microsoft.Azure.Storage.Blob, version 11.1.0
When you're using UploadFromByteArray method to upload to azure blob storage, there're some scenarios:
1.The files(or bytes array) are not big, like 10M or 100M, then there is no uncommitted status for the blob. In this case, by default, the concurrency policy "last writes win" apply. So here, you don't need to worry about uncommitted thing.
2.If the files(or bytes array) are big, like 200M, when you use UploadFromByteArray method, it will split into many blocks with a unique block id.
In this case, when the blob is uncommitted(before it calls put block list api), you can not perform another write operation for the blob. If you have a 2nd write operation, there is a error message "The specified blob or block content is invalid." for the 2nd write operation. I tested this, you can see the screenshot below:
Regarding your 3rd question, as per my test, when the status changes from uncommitted(when using put block api) -> committed(when using put block list), with the help of fiddler, I calculate the time is very short, less than 1s:
Hope it helps.

Related

List Blobs in Container having 1000+ (large number of files). [AZURE-STORAGE][REST]

I am trying to make a get List of Blobs in a Container REST call to my Azure storage account.
Currently, I have a limited number of files and the response that I get is in the following format as specified in official doc.:
But if there are 1000s of files in this container the response will be huge will it still be in the following XML format or will there be any pagination or paging?
I can't practically test it for 1000 files and there is no such thing mentioned in the docs here. Link

Simple answer to your question is yes. Maximum number of blobs that can be returned in a single list blobs request is 5000 (it can be less than that and even zero).
If more blobs are available, then you’ll get a continuation token which you can use to fetch next set of blobs. This continuation token can be found in element in response body.
For further reference, please see this link: https://learn.microsoft.com/en-us/rest/api/storageservices/enumerating-blob-resources.

Azure blob upload rename if blob name exist

In Azure blob upload, a file is overwritten if you upload a new file with the same file name (in the same container).
I would like to rename the new file before saving it, to avoid overwriting any files - Is this possible?
Scenario:
Upload file "Image.jpg" to container "mycontainer"
Upload file "Image.jpg" to container "mycontainer" (with different content)
Rename second "Image.png" to "Image_{guid}.jpg" before saving it to "mycontainer".

You cannot rename a blob (there's no API for it). Your options:
check if blob name exists, prior to uploading, and choosing a different name for your about-to-be-uploaded blob if the name is already in use
simulate rename by copying existing blob to new blob of different name, then deleting original blob
As #juunas pointed out in comments: You'd have to manage your workflow to avoid potential race condition regarding checking for existence, renaming, etc.

I recommend using an "If-None-Match: *" conditional header (sometimes known as "If-Not-Exists" in the client libraries). If you include this header on either your PutBlob or PutBlockList operations, the call will fail and data will not be overwritten. You can catch this client-side and retry the upload operation (with a different blob name.)
This has two advantages over checking to see if the blob exists before uploading. First, you no longer have the potential race condition. Second, calling Exists() adds a lot of additional overhead - an additional HTTP call for every upload, which is significant unless your blobs are quite large or latency doesn't matter. With the access condition, you only need multiple calls when the name collides, which should hopefully be a rare case.
Of course, it might be easier / cleaner to just always use the GUID, then you don't have to worry about it.

Needing to rename may be indicative of an anti-pattern. If your ultimate goal is to change the name of the file when downloaded, you can do so and keep the blob name abstract and unique.
You can set the http download filename by assigning ContentDisposition property with
attachment;filename="yourfile.txt"
This will ensure that the header is set when the blob is accessed either as public or a SAS url.

Is the Azure Put Blob operation atomic?

The documentation for Azure's Put Blob REST API operation tells us that it is possible to upload a block blob up to 64 MB with a single request.
I'm wondering whether such an operation is atomic. In particular I need to know whether the following assumptions are true or false.
If two or more clients concurrently to put a particular non-existing blob using this API specifying If-None-Match: *, then at most one of them will succeed.
A blob put using this API will never be partially exposed. It will either not exist or exist with the entire content that was put (<64MB) including metadata.
Can anyone confirm or refute these assumptions?

I have received confirmation directly from a Microsoft support technician that both of these assumptions are true:
If two or more clients concurrently to put a particular non-existing blob using this API specifying If-None-Match: *, then at most one of them will succeed.
A blob put using this API will never be partially exposed. It will either not exist or exist with the entire content that was put (<64MB) including metadata.

Is the Azure Put Blob operation atomic?
Answer: Not at all.
Any attempt to read the blob before the completion of step 3 would
result in HTTP 404 (not found).
Yes, 100% secure you'll receive a 404
Any attempt to read the blob after the completion of step 3 would
either see the entire blob content and meta data, or result in HTTP
404 (not found) in case step 3 was not successful.
Yes, if the operation isn't complete there is no file in blob storage
Any attempt to put the blob with an If-None-Match: * header before the
start of step 2 would have to wait until step 3 is completed, either
successfully in which case the request must fail with HTTP 409
(precondition failed) or continue normally, since the blob would not
exist.
In my testing: there's no wait.
So, normally after a second attempt to upload the same file name you will receive a HTTP/1.1 409 The specified blob already exists. (just if you have sent the request with If-None-Match:* header)
The problem is that if the first upload file hasn't received yet the first 201 confirmation (or unique if you're uploading all in one request) then the second file will be allowed to create the resource even if it was launched after the first one. This use to happen if the second file is shorter than the first one because maybe in just the 1st (short ) request the file will finish the transmission.
The weirdest thing is that when this happen the first stream will continue uploading data normally until when last request is emitted, the answer for the last request will be 409.
I strongly recommend you to create a spike solution to test your specific use case because the situation described above maybe is not a valid use case for your application.

Azure Storage copy an image from blob to blob

I am using Azure Storage Nodejs and what i need to do is to copy image from one blob to another.
First i tried to getBlobToFile to get the image on temp location in disk and then just createBlockBlobFromFile from that temp location. That method did the task, but for some reason it didn't copied completely in 10% of cases.
The i was trying to use getBlobToText and the result of that put into createBlockBlobFromText, also tried to put options which is need blob to be image container. That method failed completely, image not even opened after copy.
Perhaps there is a way to copy blob file and paste it in other blobl but i didn't find that method.
What else can i do?

I'm not sure what your particular copy-error is, but... with getLocalBlobToFile(), you're actually physically moving blob content from blob storage to your VM (or local machine), and then with createBlockBlobFromLocalFile() you're pushing the entire contents back to blob storage, which is resulting in two physical network moves.
The Azure Storage system supports blob-copy as a 1st-class operation. While it's available via REST API call, it's also wrapped in the same SDK you're using, in the method BlobService.startCopyBlob() (source code here). This will instruct the storage to initiate an async copy operation, completely within the storage system (meaning no download+upload on your side). You'll be able to set source and destination, set timeouts, etc. (all parameters are fully documented in the source code).

The link in the accepted answer is broken, although the method is correct: the method startCopyBlob is documented here
(Updated: Jan 3, 2020) https://learn.microsoft.com/en-us/javascript/api/azure-storage/BlobService?view=azure-node-latest#azure_storage_BlobService_createBlockBlobFromLocalFile
(The old link) https://learn.microsoft.com/en-us/javascript/api/azure-storage/BlobService?view=azure-node-latest#azure_storage_BlobService_createBlockBlobFromLocalFile

Azure Block Blob PUT fails when using HTTPS

I have written a Cygwin app that uploads (using the REST API PUT operation) Block Blobs to my Azure storage account, and it works well for different size blobs when using HTTP. However, use of SSL (i.e. PUT using HTTPS) fails for Blobs greater than 5.5MB. Blobs less than 5.5MB upload correctly. Anything greater and I find that the TCP session (as seen by Wireshark) reports a dwindling window size that goes to 0 once the aforementioned number of bytes are transferred. The failure is repeatable and consistent. As a point of reference, PUT operations against my Google/AWS/HP cloud storage accounts work fine when using HTTPS for various object sizes, which suggests the problem is not my client but specific to the HTTPS implementation on the MSAZURE storage servers.
If I upload the 5.5MB blob as two separate uploads of 4MB and 1.5MB followed by a PUT Block List, the operation succeeds as long as the two uploads used separate HTTPS sessions. Notice the emphasis on separate. That same operation fails if I attempt to maintain an HTTPS session across both uploads.
Any ideas on why I might be seeing this odd behavior with MS Azure? Same PUT operation with HTTPS works ok with AWS/Google/HP cloud storage servers.

Thank you for reporting this and we apologize for the inconvenience. We have managed to recreate the issue and have filed a bug. Unfortunately we cannot share a timeline for the fix at this time, but we will respond to this forum when the fix has been deployed. In the meantime, a plausible workaround (and a recommended best practice) is to break large uploads into smaller chunks (using the Put Block and Put Block List APIs), thus enabling the client to parallelize the upload.

This bug has now been fixed and the operation should now complete as expected.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string