Is the Azure Put Blob operation atomic?

Is the Azure Put Blob operation atomic? - azure

The documentation for Azure's Put Blob REST API operation tells us that it is possible to upload a block blob up to 64 MB with a single request.
I'm wondering whether such an operation is atomic. In particular I need to know whether the following assumptions are true or false.
If two or more clients concurrently to put a particular non-existing blob using this API specifying If-None-Match: *, then at most one of them will succeed.
A blob put using this API will never be partially exposed. It will either not exist or exist with the entire content that was put (<64MB) including metadata.
Can anyone confirm or refute these assumptions?

I have received confirmation directly from a Microsoft support technician that both of these assumptions are true:
If two or more clients concurrently to put a particular non-existing blob using this API specifying If-None-Match: *, then at most one of them will succeed.
A blob put using this API will never be partially exposed. It will either not exist or exist with the entire content that was put (<64MB) including metadata.

Is the Azure Put Blob operation atomic?
Answer: Not at all.
Any attempt to read the blob before the completion of step 3 would
result in HTTP 404 (not found).
Yes, 100% secure you'll receive a 404
Any attempt to read the blob after the completion of step 3 would
either see the entire blob content and meta data, or result in HTTP
404 (not found) in case step 3 was not successful.
Yes, if the operation isn't complete there is no file in blob storage
Any attempt to put the blob with an If-None-Match: * header before the
start of step 2 would have to wait until step 3 is completed, either
successfully in which case the request must fail with HTTP 409
(precondition failed) or continue normally, since the blob would not
exist.
In my testing: there's no wait.
So, normally after a second attempt to upload the same file name you will receive a HTTP/1.1 409 The specified blob already exists. (just if you have sent the request with If-None-Match:* header)
The problem is that if the first upload file hasn't received yet the first 201 confirmation (or unique if you're uploading all in one request) then the second file will be allowed to create the resource even if it was launched after the first one. This use to happen if the second file is shorter than the first one because maybe in just the 1st (short ) request the file will finish the transmission.
The weirdest thing is that when this happen the first stream will continue uploading data normally until when last request is emitted, the answer for the last request will be 409.
I strongly recommend you to create a spike solution to test your specific use case because the situation described above maybe is not a valid use case for your application.

Related

Azure file storage: High count of ClientOtherError

Just noticed that my fileshare storage in Azure has a very high rate of the "ClientOtherError" appearing. They're running at anywhere from 50-100% of the success count.
Anyone have any experience as to why this might be?
The attached graph shows the ClientOtherError transactions in red/orange and the successful transactions in blue.

here my error rate as compare (Win FS and also used by AKS cluster)
I do a good amount of overwrites, maybe that contributes to the number of errors.

ClientOtherError :
Authorized request that failed as expected. This error can represent
many 300-400 level HTTP status codes and conditions such as NotFound
and ResourceAlreadyExists.
We came across very high percentage of ClientOtherError (Failed transaction by response type) with our Azure blob storage. However, in our case this error can be ignored. This was happening because operations were being performed on files that didn't exist. They were basically successful API calls that return non 200 HTTP status code. In our scenario, Failed transaction by API name showed below items.
DeleteFile
GetBlobProperties
GetFileProperties
blob storage example 1:
blob storage example 2:
ClientOtherError usually means expected errors, such as not found and resource already exists. If your code uses APIs such as Exists, Create***IfNotExist(for example, CreateTableIfNotExist), those errors will be encountered frequently. Some examples of operations that execute successfully but that can result in unsuccessful HTTP status codes include:
ResourceNotFound (Not Found 404), for example from a GET request to a blob that does not exist.
ResourceAlreadyExists (Conflict 409), for example from a CreateIfNotExist operation where the resource already exists.
ConditionNotMet (Not Modified 304), for example from a conditional operation such as when a client sends an ETag value and an HTTP If-None-Match header to request an image only if it has been updated since the last operation.
In order to debug this further, you can use Azure storage logging which would log information about every operation performed against your storage account. It will include the HTTP code of every response.
Here is a list of common status codes. Many (not all) 300-400 level HTTP status code will result in ClientOtherError.
OP seems to face this problem with Azure file share. I suspect something similar is happening. The windows storage explorer application under the hood probably does similar API calls resulting in ClientOtherError.
File share here seems to have similar API's that can result in ClientOtherError when the file is missing.
I would say that in most of the cases this error is expected and can be ignored.

Upload 1Gb file through Logic app using sftp-ssh to Azure File share

I am using the Logic App to upload 1 Gb file as below-
Trigger - When files are added or modified (properties only)
Action1 - Get file content
Action 2- Create file(Azure fileshare)
Till 35 MB all triggers and actions works fine. After the file uploaded in SFTP crosses 40 MB, SFTP-SSH trigger and action all works fine. But while the workflow moves to the second action - 'Create File': it fails with the below error The specified resource may be in use by an SMB client'. When I see the Azure file share storage account, I see filename.partial.lock getting created. I modified the access policy as well, but the issue persists.

Logic apps are not designed to upload or download the large amount data from source/destination, its a workflow solution which you can design to provide solution your business need, however you can still use chunk upload functionality in logic-app to upload or download large file via logic app.
please refer
https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-handle-large-messages#set-up-chunking

To upload large file, make sure enable the Allow chunking.
From your description, suppose it should be SharingViolation, you could check the error codes here.
And in the official doc, there are two scenario to get the Sharing Violation error:
Sharing Violation Due to File Access
Client A opens the file with FileAccess.Write and FileShare.Read
(denies subsequent Write/Deletewhile open).
Client B then opens the file with FileAccess.Write with
FileShare.Write (denies subsequent Read/Delete while open).
Result: Client B encounters a sharing violation since it specified a
file access that is denied by the share mode specified previously by
Client A.
Sharing Violation Due to Share Mode
Client A opens the file with FileAccess.Write and FileShare.Write
(denies subsequent Read/Delete while open).
Client B then opens the file with FileAccess.Write with
FileShare.Read (denies subsequent Write/Delete while open).
Result: Client B encounters a sharing violation since it specified a
share mode that denies write access to a file that is still open for
write access.
These are scenario you need to consider, another choice you could try to use the REST API to upload the file and in the HTTP action set the Allow chunking.

I think one more way to resolve the issue is redesign which could be more scale-able solution:
Use logic app to get notification if file is added or modified on SFTP/FTP location.
once the file is added read the file path for that file.
Create Service bus message and send the file Path to Service Bus message as message content.
Create Service Bus queue message trigger Azure function which listen to those message (created in step 2)
Azure function will read the Chunk of files from SFTP using the file path.
this way you can read or write more then 30 GB file.
this solution will be more scale able solution as azure function and auto scale on demand.

Thank you all. The reason for the error 'The specified resource may be in use by an SMB client' is due to the mounting of the file share with two Linux Virtual machines. We unmounted the Linux VMs and did fresh single mounting. The error got fixed with that.

MSFT has confirmed in our discussion that "Create File share" has a limitation of 100 or 300 MB. It was just able to work with SFTP as data arrives in chunk. MSFT is working further to give proper error statement, when the file size is beyond 100 or 300 MB. Below is quoted from MSFT email-
"Thanks for the details , actually the Product team confirm to me that your flow is working by luck , it should not work with this sizes an they are working to implement the limits correctly to prevent files larger than maximum size which is maybe 300 or 100 “I am not yet sure ”
And this strange behavior is only happing when we are reading the content from SFTP chucked.
"

Azure block blob uncommited update behavior

I am writing a project that needs concurrent update to the block blob. From Microsoft documentation:
Uncommitted Block List: The list of blocks that have been uploaded for a blob using Put Block, but that have not yet been committed. These blocks are stored in Azure in association with a blob, but do not yet form part of the blob.
I could not find any documentation on
Whether an update on uncommitted blob could/should be performed
What happens when you write to an uncommitted blob.
How long is the time bound on a block blob to go from uncommitted -> committed based on the consistency policy you choose.
Could someone provide more context on the concurrent update behavior of uncommitted block blobs?

I tested it with fiddler, and the latest blob storage nuget package Microsoft.Azure.Storage.Blob, version 11.1.0
When you're using UploadFromByteArray method to upload to azure blob storage, there're some scenarios:
1.The files(or bytes array) are not big, like 10M or 100M, then there is no uncommitted status for the blob. In this case, by default, the concurrency policy "last writes win" apply. So here, you don't need to worry about uncommitted thing.
2.If the files(or bytes array) are big, like 200M, when you use UploadFromByteArray method, it will split into many blocks with a unique block id.
In this case, when the blob is uncommitted(before it calls put block list api), you can not perform another write operation for the blob. If you have a 2nd write operation, there is a error message "The specified blob or block content is invalid." for the 2nd write operation. I tested this, you can see the screenshot below:
Regarding your 3rd question, as per my test, when the status changes from uncommitted(when using put block api) -> committed(when using put block list), with the help of fiddler, I calculate the time is very short, less than 1s:
Hope it helps.

Azure batch insert - how to reduce response length?

I use Azure client lib to perform batch inserts into Azure Table Storage. Everything works fine. But when I sniff requests using Fiddler I found that every response from azure is about 90KB. I've changed prefer header to "return-no-content", but the response is still over 60KB (when request is 50KB).
Is there any way to reduce response lenght? Just to be like 100B (HTTP 202 or something).

Per OData V3 protocol format that Azure Storage Table Service uses, a batch response body must structurally match one-to-one with the corresponding batch request body, such that the same multipart MIME message structure defined for requests is used for responses.
Setting echoContent to false (aka "Prefer: return-no-content") would ensure that entities themselves are not returned though as you also observed, therefore reducing the size of the response body.

Azure Block Blob PUT fails when using HTTPS

I have written a Cygwin app that uploads (using the REST API PUT operation) Block Blobs to my Azure storage account, and it works well for different size blobs when using HTTP. However, use of SSL (i.e. PUT using HTTPS) fails for Blobs greater than 5.5MB. Blobs less than 5.5MB upload correctly. Anything greater and I find that the TCP session (as seen by Wireshark) reports a dwindling window size that goes to 0 once the aforementioned number of bytes are transferred. The failure is repeatable and consistent. As a point of reference, PUT operations against my Google/AWS/HP cloud storage accounts work fine when using HTTPS for various object sizes, which suggests the problem is not my client but specific to the HTTPS implementation on the MSAZURE storage servers.
If I upload the 5.5MB blob as two separate uploads of 4MB and 1.5MB followed by a PUT Block List, the operation succeeds as long as the two uploads used separate HTTPS sessions. Notice the emphasis on separate. That same operation fails if I attempt to maintain an HTTPS session across both uploads.
Any ideas on why I might be seeing this odd behavior with MS Azure? Same PUT operation with HTTPS works ok with AWS/Google/HP cloud storage servers.

Thank you for reporting this and we apologize for the inconvenience. We have managed to recreate the issue and have filed a bug. Unfortunately we cannot share a timeline for the fix at this time, but we will respond to this forum when the fix has been deployed. In the meantime, a plausible workaround (and a recommended best practice) is to break large uploads into smaller chunks (using the Put Block and Put Block List APIs), thus enabling the client to parallelize the upload.

This bug has now been fixed and the operation should now complete as expected.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string