Is Azure Storage blob UploadText method atomic? - azure

In a use case of mine, I am periodically overwriting a storage blob with UploadText method and also the same blob is being read in parallel. I have the below doubts:
Will the blob's LastModified time be updated before the UploadText method has written the complete data?
Can the data be partially exposed to any reader that is trying to read the blob content while UploadText is overwriting the same blob?

For Q1: No, the LastModified time will be updated until blob is committed(before that, it's in uncommitted state).
For Q2: while overwriting, since the new content is in uncommitted state, at this time, only the old content(the content before update) can be read.

Related

how to programmatically delete uncommitted blocks in azure storage using blob service?

We get an error 'The specified blob or block content is invalid' if we try to upload a block that is already present in server. how to clear those uncommittedblocks before user retries to upload the same blob?
code:'InvalidBlobOrBlock'
message:'The specified blob or block content is invalid.\nRequestId:1b015a55-201e-00be-7b1c-7e8fb8000000\nTime:2021-07-21T10:35:48.0075829Z'
name:'StorageError'
requestId:'1b015a55-201e-00be-7b1c-7e8fb8000000'
stack:'StorageError: The specified blob or block content is invalid
statusCode:400
how to clear those uncommittedblocks before user retries to upload the
same blob?
AFAIK, there's no direct way to delete uncommitted blocks. Simplest way for you would be to create a blob with the same name and then delete that blob. When creating a blob with the same name, please ensure that it is uploaded without splitting the contents into blocks.
I wrote a blog post about this error some time back that you may find useful: https://gauravmantri.com/2013/05/18/windows-azure-blob-storage-dealing-with-the-specified-blob-or-block-content-is-invalid-error/.

Re-play/Repeat/Re-Fire Azure BlobStorage Function Triggers for existing files

I've just uploaded several 10s of GBs of files to Azure CloudStorage.
Each file should get picked up and processed by a FunctionApp, in response to a BlobTrigger:
[FunctionName(nameof(ImportDataFile))]
public async Task ImportDataFile(
// Raw JSON Text file containing data updates in expected schema
[BlobTrigger("%AzureStorage:DataFileBlobContainer%/{fileName}", Connection = "AzureStorage:ConnectionString")]
Stream blobStream,
string fileName)
{
//...
}
This works in general, but foolishly, I did not do a final test of that Function prior to uploading all the files to our UAT system ... and there was a problem with the uploads :(
The upload took a few days (running over my Domestic internet uplink due to CoViD-19) so I really don't want to have to re-do that.
Is there some way to "replay" the BlobUpload Triggers? so that the function triggers again as if I'd just re-uploaded the files ... without having to transfer any data again!
As per this link
Azure Functions stores blob receipts in a container named
azure-webjobs-hosts in the Azure storage account for your function app
(defined by the app setting AzureWebJobsStorage).
To force reprocessing of a blob, delete the blob receipt for that blob
from the azure-webjobs-hosts container manually. While reprocessing
might not occur immediately, it's guaranteed to occur at a later point
in time. To reprocess immediately, the scaninfo blob in
azure-webjobs-hosts/blobscaninfo can be updated. Any blobs with a last
modified timestamp after the LatestScan property will be scanned
again.
I found a hacky-AF work around, that re-processes the existing file:
If you add Metadata to a blob, that appears to re-trigger the BlobStorage Function Trigger.
Accessed in Azure Storage Explorer, but Right-clicking on a Blob > Properties > Add Metadata.
I was settings Key: "ForceRefresh", Value "test".
I had a problem with the processing of blobs in my code which meant that there were a bunch of messages in the webjobs-blobtrigger-poison queue. I had to move them back to azure-webjobs-blobtrigger-name-of-function-app. Removing the blob receipts and adjusting the scaninfo blob did not work without the above step.
Fortunately Azure Storage Explorer has a menu option to move the messages from one queue to another:
I found a workaround, if you aren't invested in the file name:
Azure Storage Explorer, has a "Clone with new name" button in the top bar, which will add a new file (and trigger the Function) without transferring the data via your local machine.
Note that "Copy" followed by "Paste" also re-triggers the blob, but appears to transfer the data down to your machine and then back up again ... incredibly slowly!

Azure blob versioning

Is there a way I can version the blobs being stored in Azure storage account, so that the blobs can be picked up using their version or the latest blob can be picked up?
Versioning for blobs is accomplished by taking a snapshot of a blob which creates a read-only copy of the blob based on the blob's contents when snapshot was taken.
When a snapshot for a blob is taken, Azure Storage returns a date/time value when the snapshot was taken. You can access that blob by appending this value to the blob's URL e.g. https://myaccount.blob.core.windows.net/mycontainer/myblob?snapshot=2017-06-09T00:00:00.0000000Z
However this snapshot date/time value is not stored anywhere in Azure.
What you could do is store this date/time value in your database and whenever you need to present this version of the blob in your application, you can simply append this value to the blob's URL.
Please note that snapshot exist along with blob i.e. if you delete the base blob, all snapshots for the blob will also be deleted.

Check if Blob of unknown Blob type exists

I've inherited a project built using the Azure Storage Client 1.7 and am upgrading it as Microsoft have announced that this will no longer be supported from December this year.
References to the files in Blob storage are stored in a database with the following fields:
FilePath - a string in the form of uploadfiles/xxx/yyy/Image-20140117170146.jpg
FileURI - A string in the form of https://zzz.blob.core.windows.net/uploadfiles/xxx/yyy/Image-20140117170146.jpg
GetBlobReferenceFromServer will throw an exception if the file doesn't exist, so it seems you should use GetBlockBlobReference if you know the container and the Blob type.
So my question(s):
Can I assume any Blobs currently uploaded (using StorageClient 1.7) will be BlockBlobs?
As I need to know the container name to call GetBlockBlobReference can I reliably say that in the examples above my container would always be uploadfiles
Can I assume any Blobs currently uploaded (using StorageClient 1.7)
will be BlockBlobs?
Though you can't be 100% sure that the blobs uploaded via Storage Client library 1.7 are Blob Blobs because 1.7 also supported Page Blobs however you can make some intelligent guesses. For example, if the files are image files and other commonly used files (pdf, document etc.), you can assume that they are block blobs. Typically you would see vhd files uploaded as page blobs. Again if these are uploaded by the users of your application, more than likely they are block blobs.
Having said this, I think you should use GetBlobReferenceFromServer method. What you could do is list all blobs from the database and for each of them call GetBlobReferenceFromServer method. If the blob exists, then you will get the blob type. If the blob doesn't exist, this method will give you an error. This would be the quickest way to identify the blob type of existing entries in the database. If you want, you can store the blob type back in the database along with existing record if you find both block and page blobs when you check the blob type so that if in future you need to decide between creating a CloudBlockBlob or CloudPageBlob reference, you can look at this field.
As I need to know the container name to call GetBlockBlobReference can
I reliably say that in the examples above my container would always be
uploadfiles
Yes. In the examples you listed above, you can say that the blob container is upload files.

Upload to Blob Storage while overwriting existing Blob

When using the StorageClient and calling UploadByteArray() with a blob name that already exists, will this cause any data corruption ? (In other words, do I have to call Delete() before uploading, which will cost another transaction ?)
You should just be able to call UploadByteArray() without having to do the delete first and it will overwrite the blob that is already there.

Resources