Tracking the changes in data of azure blob storage - azure

Azure provides tracking of activities by activity log but I am working on a use case in which I have to track changes in a JSON file that is in the Azure blob storage and I have to figure out how can I track changes in the file.

You can enable Blob storage versioning to automatically keep track changes of a file in blob.When blob versioning is enabled, you can access earlier versions of a blob to recover your data if it is modified or deleted.
Each blob version is identified by a unique version ID. The value of
the version ID is the timestamp at which the blob was updated. The
version ID is assigned at the time that the version is created.
When you call a write operation to create or modify a blob, Azure
Storage returns the x-ms-version-id header in the response. This
header contains the version ID for the current version of the blob
that was created by the write operation.
The version ID remains the same for the lifetime of the version.
Reference : https://learn.microsoft.com/en-us/azure/storage/blobs/versioning-overview

Related

How to delete a leased blob in Azure?

I am publishing and subscribing to azure event hub, which uses blob in the container in a storage account. Messages are not published with this storage account but working with another storage account.
I could see the blob with the lease status as leased. I think deleting it and creating it again may solve the issue, so I tried to delete this and create a new one. But not able to delete it. I also tried breaking the lease but it again sets the lease status to leased.
Is there any way to solve this issue?
• I tried to reproduce your exact scenario by creating a blob container and uploading a blob in it. Then acquiring it on lease through REST API, breaking the lease and then finally deleting the blob through REST API itself all successfully. I used ‘Postman’ application as the REST API platform for this purpose and also used an application registered in Azure AD through which the token required for the blob operations to be performed was retrieved. Please find the below snapshots for your reference: -
a) Blob ‘ACMx7.pdf’ acquired on lease through appropriate blob owner and user authorization and header parameters.
b) Blob ‘ACMx7.pdf’ lease has been broken through appropriate header, i.e., x-ms-lease-action : break
c) Blob ‘ACMx7.pdf’ has been deleted after the lease has been broken by passing the headers in ‘Postman’ as below.
Please note that the lease given to the blob was given for an infinite period with reference from the below documentation links on using the required headers for the action required on the blob: -
https://learn.microsoft.com/en-us/rest/api/storageservices/lease-blob
https://learn.microsoft.com/en-us/rest/api/storageservices/delete-blob

Azure blob snapshots not getting deleted via logic app

While deleting old blobs using a logic app by giving the container path, we ran into an error message "Status code:409, "message": This operation is not permitted because the blob has snapshots”. This subsequently fails the running of the logic app. I tried to use delete blob by providing Id and Filename but the error persists. Is there any way to specifically delete blob and its corresponding snapshot using the logic app? Approaches to solving the issue are welcome. Blob's lifecycle management policy does not work for us.
You can use an Azure Function to delete your blob including this header in your request:
x-ms-delete-snapshots: {include, only}
Required if the blob has associated snapshots. Specify one of the following two options:
include: Delete the base blob and all of its snapshots.
only: Delete only the blob's snapshots and not the blob itself.
This header should be specified only for a request against the base blob resource. If this header is specified on a request to delete an individual snapshot, the Blob service returns status code 400 (Bad Request).
If this header is not specified on the request and the blob has associated snapshots, the Blob service returns status code 409 (Conflict).
Check documentation here.
You can try to filter and order your Blobs before remove your base blob, deleting snapshots first within your Logic App.

Azure: Unable to copy Archive blobs from one storage account to another?

Whenever I try to copy Archive blobs to a different storage account and changing its tier in destination. I am getting the following error:
Copy source blob has been modified. ErrorCode: CannotVerifyCopySource
I have tried copying Hot/Cool blobs to Hot/Cool/Archive. I am facing the issue only while copying Archive to Hot/Cool/Archive. Also, there is no issue while copying within same storage account.
I am using Azure python SDK:
blob_url = source_block_blob_service.make_blob_url(copy_from_container, blob_name, sas_token = sas)
dest_blob_service.copy_blob(copy_to_container, blob_name, blob_url, requires_sync = True, standard_blob_tier = 'Hot')
The reason you're getting this error is because copying an archived blob is only supported in the same storage account and you're trying it across different storage account.
From the REST API documentation page:
Copying Archived Blob (version 2018-11-09 and newer)
An archived blob can be copied to a new blob within the same storage
account. This will still leave the initially archived blob as is. When
copying an archived blob as source the request must contain the header
x-ms-access-tier indicating the tier of the destination blob. The data
will be eventually copied to the destination blob.
While a blob is in the archive access tier, it's considered offline and can't be read or modified.
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-rehydration
To read the blob, you either need to rehydrate it first. Or, as described in the link above, you can also use the CopyBlob operation. I am not sure if the python SDK copy_blob() operation uses that API behind the scenes - maybe not if it did not work that way for you.

Azure blob versioning

Is there a way I can version the blobs being stored in Azure storage account, so that the blobs can be picked up using their version or the latest blob can be picked up?
Versioning for blobs is accomplished by taking a snapshot of a blob which creates a read-only copy of the blob based on the blob's contents when snapshot was taken.
When a snapshot for a blob is taken, Azure Storage returns a date/time value when the snapshot was taken. You can access that blob by appending this value to the blob's URL e.g. https://myaccount.blob.core.windows.net/mycontainer/myblob?snapshot=2017-06-09T00:00:00.0000000Z
However this snapshot date/time value is not stored anywhere in Azure.
What you could do is store this date/time value in your database and whenever you need to present this version of the blob in your application, you can simply append this value to the blob's URL.
Please note that snapshot exist along with blob i.e. if you delete the base blob, all snapshots for the blob will also be deleted.

Check if Blob of unknown Blob type exists

I've inherited a project built using the Azure Storage Client 1.7 and am upgrading it as Microsoft have announced that this will no longer be supported from December this year.
References to the files in Blob storage are stored in a database with the following fields:
FilePath - a string in the form of uploadfiles/xxx/yyy/Image-20140117170146.jpg
FileURI - A string in the form of https://zzz.blob.core.windows.net/uploadfiles/xxx/yyy/Image-20140117170146.jpg
GetBlobReferenceFromServer will throw an exception if the file doesn't exist, so it seems you should use GetBlockBlobReference if you know the container and the Blob type.
So my question(s):
Can I assume any Blobs currently uploaded (using StorageClient 1.7) will be BlockBlobs?
As I need to know the container name to call GetBlockBlobReference can I reliably say that in the examples above my container would always be uploadfiles
Can I assume any Blobs currently uploaded (using StorageClient 1.7)
will be BlockBlobs?
Though you can't be 100% sure that the blobs uploaded via Storage Client library 1.7 are Blob Blobs because 1.7 also supported Page Blobs however you can make some intelligent guesses. For example, if the files are image files and other commonly used files (pdf, document etc.), you can assume that they are block blobs. Typically you would see vhd files uploaded as page blobs. Again if these are uploaded by the users of your application, more than likely they are block blobs.
Having said this, I think you should use GetBlobReferenceFromServer method. What you could do is list all blobs from the database and for each of them call GetBlobReferenceFromServer method. If the blob exists, then you will get the blob type. If the blob doesn't exist, this method will give you an error. This would be the quickest way to identify the blob type of existing entries in the database. If you want, you can store the blob type back in the database along with existing record if you find both block and page blobs when you check the blob type so that if in future you need to decide between creating a CloudBlockBlob or CloudPageBlob reference, you can look at this field.
As I need to know the container name to call GetBlockBlobReference can
I reliably say that in the examples above my container would always be
uploadfiles
Yes. In the examples you listed above, you can say that the blob container is upload files.

Resources