Azure file storage: High count of ClientOtherError - azure

Just noticed that my fileshare storage in Azure has a very high rate of the "ClientOtherError" appearing. They're running at anywhere from 50-100% of the success count.
Anyone have any experience as to why this might be?
The attached graph shows the ClientOtherError transactions in red/orange and the successful transactions in blue.

here my error rate as compare (Win FS and also used by AKS cluster)
I do a good amount of overwrites, maybe that contributes to the number of errors.

ClientOtherError :
Authorized request that failed as expected. This error can represent
many 300-400 level HTTP status codes and conditions such as NotFound
and ResourceAlreadyExists.
We came across very high percentage of ClientOtherError (Failed transaction by response type) with our Azure blob storage. However, in our case this error can be ignored. This was happening because operations were being performed on files that didn't exist. They were basically successful API calls that return non 200 HTTP status code. In our scenario, Failed transaction by API name showed below items.
DeleteFile
GetBlobProperties
GetFileProperties
blob storage example 1:
blob storage example 2:
ClientOtherError usually means expected errors, such as not found and resource already exists. If your code uses APIs such as Exists, Create***IfNotExist(for example, CreateTableIfNotExist), those errors will be encountered frequently. Some examples of operations that execute successfully but that can result in unsuccessful HTTP status codes include:
ResourceNotFound (Not Found 404), for example from a GET request to a blob that does not exist.
ResourceAlreadyExists (Conflict 409), for example from a CreateIfNotExist operation where the resource already exists.
ConditionNotMet (Not Modified 304), for example from a conditional operation such as when a client sends an ETag value and an HTTP If-None-Match header to request an image only if it has been updated since the last operation.
In order to debug this further, you can use Azure storage logging which would log information about every operation performed against your storage account. It will include the HTTP code of every response.
Here is a list of common status codes. Many (not all) 300-400 level HTTP status code will result in ClientOtherError.
OP seems to face this problem with Azure file share. I suspect something similar is happening. The windows storage explorer application under the hood probably does similar API calls resulting in ClientOtherError.
File share here seems to have similar API's that can result in ClientOtherError when the file is missing.
I would say that in most of the cases this error is expected and can be ignored.

Related

Firebase Cloud Express queue for storage resource to be generated

I have a large dataset stored in a Firestore collection and a Nodejs express app (exposed as a firebase functions.https.onRequest) with an endpoint which allows users to query this dataset and download large amounts of data.
I need to return the data in CSV format from the endpoint. Because there is a lot of data, I want to avoid doing large database reads each time the endpoint is hit.
My current endpoint does this:
User hits endpoint with a database query requesting documents within a range
Query is hashed into a filename. eg query_"startRange"_"endRange".csv
Check Firebase storage to see if this query has been run before
if the csv already exists:
return a 302 redirect to the csv file with a signed url
if the csv doesn't exist:
Run the query on the Firestore collection
Transform the data into the appropriate CSV format
upload the new CSV to Firebase storage
return a 302 redirect to the newly generated csv file with a signed url
This process is currently working really well, except I can already foresee an issue. The CSV generation stage takes roughly 20s for large queries and there is a high possibility of the same request being hit from multiple users at the same time.
I want to build in some sort of queuing system so that if X number of users hit the endpoint at once, only the first request triggers the generation of the new CSV and the other (X-1) requests will be queued and then resolved once the CSV is generated.
I have currently looked into firebase-queue which appears to be deprecated and not intended to be used with Cloud functions.
I have also seen other libraries like p-queue but I'm not sure I understand how that would work with Firebase Cloud functions and how seperate instances are booted for many requests.
I think that in your scenario the queue approach wouldn't work quite well with Cloud Functions. The queue cannot be implemented in a function as multiple instances won't know about each other, therefore the queue would need to be implemented in some kind of dedicated server, which IMO defeats the purpose of using Cloud Functions as both the queue and the processing could be ran in the same server.
I would suggest having a collection in Firestore that keeps track of the queries that have been requested. This way even if the CSV file isn't still saved on Storage you could check if some function instance is already creating it, then you could sleep the function until the operation completes and return the signed url. Overall the algorithm might look somewhat like this:
# Python PseudoCode
if csv_in_storage():
return signed_url()
if query_in_firestore():
while True:
sleep(X)
if csv_in_storage():
return signed_url()
try:
add_query_firestore()
csv = create_csv()
upload_csv(csv)
return signed_url()
except Exception:
while True:
sleep(X)
if csv_in_storage():
return signed_url()
The final try/catch is there because the add_query_firestore operation might eventually fail if two functions make simultaneous attempts to write the same document into Firestore. Nonetheless this are also good news since you know the CSV creation is in progress and you can wait for it to complete.
Please keep in mind the pseudocode above is just to illustrate the idea, having the while True as it is may lead to infinite loop and function timeout which is plain bad :).
I ended up solving this using a solution similar to what Happy-Monad suggested. I'm using the nodejs admin SDK but the idea is similar.
There is a collection in Firestore which keeps track of executed queries Queries. When a user hits the endpoint, I call the admin doc("Queries/<queryId>").create() method. This method will only create the query doc if it doesn't already exist, so I am able to avoid race conditions between parallel requests if I were to check for existing queries first.
Next the request starts an onSnapshot listener to the query that it attempted to created. The query has a status field which starts as created. The onSnapshot will only resolve once that status has changed to complete.
I have onCreate database trigger listening to "Queries/*". This database trigger handles the requested query and updates the query status to complete. In the case that the query already exists, the status is already in the complete state, so the onSnapshot resolves instantly.

Upload 1Gb file through Logic app using sftp-ssh to Azure File share

I am using the Logic App to upload 1 Gb file as below-
Trigger - When files are added or modified (properties only)
Action1 - Get file content
Action 2- Create file(Azure fileshare)
Till 35 MB all triggers and actions works fine. After the file uploaded in SFTP crosses 40 MB, SFTP-SSH trigger and action all works fine. But while the workflow moves to the second action - 'Create File': it fails with the below error The specified resource may be in use by an SMB client'. When I see the Azure file share storage account, I see filename.partial.lock getting created. I modified the access policy as well, but the issue persists.
Logic apps are not designed to upload or download the large amount data from source/destination, its a workflow solution which you can design to provide solution your business need, however you can still use chunk upload functionality in logic-app to upload or download large file via logic app.
please refer
https://learn.microsoft.com/en-us/azure/logic-apps/logic-apps-handle-large-messages#set-up-chunking
To upload large file, make sure enable the Allow chunking.
From your description, suppose it should be SharingViolation, you could check the error codes here.
And in the official doc, there are two scenario to get the Sharing Violation error:
Sharing Violation Due to File Access
Client A opens the file with FileAccess.Write and FileShare.Read
(denies subsequent Write/Deletewhile open).
Client B then opens the file with FileAccess.Write with
FileShare.Write (denies subsequent Read/Delete while open).
Result: Client B encounters a sharing violation since it specified a
file access that is denied by the share mode specified previously by
Client A.
Sharing Violation Due to Share Mode
Client A opens the file with FileAccess.Write and FileShare.Write
(denies subsequent Read/Delete while open).
Client B then opens the file with FileAccess.Write with
FileShare.Read (denies subsequent Write/Delete while open).
Result: Client B encounters a sharing violation since it specified a
share mode that denies write access to a file that is still open for
write access.
These are scenario you need to consider, another choice you could try to use the REST API to upload the file and in the HTTP action set the Allow chunking.
I think one more way to resolve the issue is redesign which could be more scale-able solution:
Use logic app to get notification if file is added or modified on SFTP/FTP location.
once the file is added read the file path for that file.
Create Service bus message and send the file Path to Service Bus message as message content.
Create Service Bus queue message trigger Azure function which listen to those message (created in step 2)
Azure function will read the Chunk of files from SFTP using the file path.
this way you can read or write more then 30 GB file.
this solution will be more scale able solution as azure function and auto scale on demand.
Thank you all. The reason for the error 'The specified resource may be in use by an SMB client' is due to the mounting of the file share with two Linux Virtual machines. We unmounted the Linux VMs and did fresh single mounting. The error got fixed with that.
MSFT has confirmed in our discussion that "Create File share" has a limitation of 100 or 300 MB. It was just able to work with SFTP as data arrives in chunk. MSFT is working further to give proper error statement, when the file size is beyond 100 or 300 MB. Below is quoted from MSFT email-
"Thanks for the details , actually the Product team confirm to me that your flow is working by luck , it should not work with this sizes an they are working to implement the limits correctly to prevent files larger than maximum size which is maybe 300 or 100 “I am not yet sure ”
And this strange behavior is only happing when we are reading the content from SFTP chucked.
"

Is there a reason why I shouldn't set a size of blob to maximum size of 8TB?

I am connected to Azure blob storage and we need to upload files bigger than 256MB.
I followed documentation and created PageBlob and than uploaded files through PutPage.
The problem is that I don't know in advance how big the data is going to be so I set it to the max size of 8TB.
Is there a reason why I shouldn't do this?
As far as I know, the max size is only the maximum possible size for the blob and shouldn't cause any issues with memory.
Correct me if I'm wrong please.
Thanks
Per my experience and according to the REST API reference Put Page, the API requires to set the value of the header parameter Content-Length with a specific number of bytes being transmitted in the request body, not an approximate value or possible value, as the figure below.
Otherwise, Azure Storage will refuse the request and response an error after it checked the validity of the request on cloud. I think it's the real reason about the errors you might get from Common REST API Error Codes and Blob Service Error Codes.

Is the Azure Put Blob operation atomic?

The documentation for Azure's Put Blob REST API operation tells us that it is possible to upload a block blob up to 64 MB with a single request.
I'm wondering whether such an operation is atomic. In particular I need to know whether the following assumptions are true or false.
If two or more clients concurrently to put a particular non-existing blob using this API specifying If-None-Match: *, then at most one of them will succeed.
A blob put using this API will never be partially exposed. It will either not exist or exist with the entire content that was put (<64MB) including metadata.
Can anyone confirm or refute these assumptions?
I have received confirmation directly from a Microsoft support technician that both of these assumptions are true:
If two or more clients concurrently to put a particular non-existing blob using this API specifying If-None-Match: *, then at most one of them will succeed.
A blob put using this API will never be partially exposed. It will either not exist or exist with the entire content that was put (<64MB) including metadata.
Is the Azure Put Blob operation atomic?
Answer: Not at all.
Any attempt to read the blob before the completion of step 3 would
result in HTTP 404 (not found).
Yes, 100% secure you'll receive a 404
Any attempt to read the blob after the completion of step 3 would
either see the entire blob content and meta data, or result in HTTP
404 (not found) in case step 3 was not successful.
Yes, if the operation isn't complete there is no file in blob storage
Any attempt to put the blob with an If-None-Match: * header before the
start of step 2 would have to wait until step 3 is completed, either
successfully in which case the request must fail with HTTP 409
(precondition failed) or continue normally, since the blob would not
exist.
In my testing: there's no wait.
So, normally after a second attempt to upload the same file name you will receive a HTTP/1.1 409 The specified blob already exists. (just if you have sent the request with If-None-Match:* header)
The problem is that if the first upload file hasn't received yet the first 201 confirmation (or unique if you're uploading all in one request) then the second file will be allowed to create the resource even if it was launched after the first one. This use to happen if the second file is shorter than the first one because maybe in just the 1st (short ) request the file will finish the transmission.
The weirdest thing is that when this happen the first stream will continue uploading data normally until when last request is emitted, the answer for the last request will be 409.
I strongly recommend you to create a spike solution to test your specific use case because the situation described above maybe is not a valid use case for your application.

Azure Block Blob PUT fails when using HTTPS

I have written a Cygwin app that uploads (using the REST API PUT operation) Block Blobs to my Azure storage account, and it works well for different size blobs when using HTTP. However, use of SSL (i.e. PUT using HTTPS) fails for Blobs greater than 5.5MB. Blobs less than 5.5MB upload correctly. Anything greater and I find that the TCP session (as seen by Wireshark) reports a dwindling window size that goes to 0 once the aforementioned number of bytes are transferred. The failure is repeatable and consistent. As a point of reference, PUT operations against my Google/AWS/HP cloud storage accounts work fine when using HTTPS for various object sizes, which suggests the problem is not my client but specific to the HTTPS implementation on the MSAZURE storage servers.
If I upload the 5.5MB blob as two separate uploads of 4MB and 1.5MB followed by a PUT Block List, the operation succeeds as long as the two uploads used separate HTTPS sessions. Notice the emphasis on separate. That same operation fails if I attempt to maintain an HTTPS session across both uploads.
Any ideas on why I might be seeing this odd behavior with MS Azure? Same PUT operation with HTTPS works ok with AWS/Google/HP cloud storage servers.
Thank you for reporting this and we apologize for the inconvenience. We have managed to recreate the issue and have filed a bug. Unfortunately we cannot share a timeline for the fix at this time, but we will respond to this forum when the fix has been deployed. In the meantime, a plausible workaround (and a recommended best practice) is to break large uploads into smaller chunks (using the Put Block and Put Block List APIs), thus enabling the client to parallelize the upload.
This bug has now been fixed and the operation should now complete as expected.

Resources