Cold storage/WORM on Azure? - azure

I have a requirement that states certain data in my application must be write once, read many. Is there a construct or best practice approach on Azure that would facilitate this?

We do not have any best practice on this topic but I can think of two options which might work:
Option 1: Use Append Block, a new blob type which is currently available with the newest storage service version. All writes to an Append Blob happen at the end of the blob. Updating and deleting existing blocks is not supported. To modify an Append Blob, you add blocks to the end of the blob via the new Append Block operation. Each appended block is accessible immediately. http://blogs.msdn.com/b/windowsazurestorage/archive/2015/04/13/introducing-azure-storage-append-blob.aspx has more information about Storage Append Blob.
Option 2: You can take snapshots of your blobs very frequently as snapshots are read-only copy of your original blobs. However, someone who has an authorized access to snapshots will be able to delete them.

We now have cold storage refer http://www.zdnet.com/article/microsoft-launches-cool-blob-azure-storage-at-1c-per-gb/. Hope this helps

Related

Azure Event Grid: Find out if a blob is being overwritten

I have wired an EventGridTrigger to Azure Functions and listen on Microsoft.Storage.BlobCreated changes. So far, it seems to be working fine. However, I have observed that multiple events will be triggered for the same blob if the client overwrites it. I need to do some server-side processing only once per blob creation. Is there any metadata available to us to see how many times a blob has been overwritten?
As a workaround, I'm thinking of saving blob URI to a Cosmos container as a primary key to see if it's ever been processed before, but this sounds like overkill for something this trivial.
Here are 3 possible solutions:
Store the blob uri to an Azure Table (same as your sugestion with Cosmos but cheaper)
Turn on blob versioning an check that it is the first version of the blob before processing. Turning on versioning has a cost.
Check the code that updates the blob. Is it possible to us the BlockBlobClient and only update the affected block? Will this avoid BlobCreated being triggered on updated.
I would start with checking if option 3 was possible. If not do option 1.

Azure ZRS/GRS vs snapshots

Why would I need to create a blob snapshot and incur additional cost if Azure already provides GRS(Geo redundant storage) or ZRS (Zone redundant storage)?
Redundancy (ZRS/GRS/RAGRS) provides means to achieve high availability of your resources (blobs in your scenario). By enabling redundancy you are ensuring that a copy of your blob is available in another region/zone in case primary region/zone is not available. It also ensures against data corruption of the primary blob.
When you take a snapshot of your blob, a readonly copy of that blob in its current state is created and stored. If needed, you can restore a blob from a snapshot. This scenario is well suited if you want to store different versions of the same blob.
However, please keep in mind that neither redundancy nor snapshot is backup because if you delete base blob, all the snapshots associated with that blob are deleted and all the copies of that blob available in other zones/regions are deleted as well.
I guess you need to understand the difference between Backup and Redundancy.
Backups make sure if something is lost, corrupted or stolen, that a copy of the data is available at your disposal.
Redundancy makes sure that if something fails—your computer fails, a drive gets fried, or a server freezes and you are able to work regardless of the problem. Redundancy means that all your changes are replicated to another location. In case of a failover, your slave can theoretically function as a master and serve the (hopefully) latest state of your file system.
You could also turn soft delete on. That would keep a copy of every blob for every change made to it, even if someone deletes it. Then you set the retention period for those blobs so they would be automatically removed after some period of time.
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-soft-delete

Azure Blob Storage: How do I batch-create a snapshot for all blobs in a given container?

I want to snapshot every blob (of 1000s) in a given container (by way of backup). The only way via the UI seems to be to do this blob-by-blob. Storage Viewer at least allows for a few at a time.
What is the most efficient way to achieve this?
Must I really resort to a for loop? :\
Can AzCopy or azure-cli help here..?
Thanks as ever
Must I really resort to a for loop? :\
Unfortunately yes. That's the only way to accomplish it as snapshots are taken at a blob level. You will need to list all blobs in a blob container and take a snapshot of each blob separately.
If you're looking for a tool to do this for you, may I suggest you take a look at Cerebrata Cerulean (Disclosure: My company is behind this tool). This has a feature where you can take snapshot of all blobs in a container with a single click (it does it by listing all blobs first and then taking snapshot of each blob individually).

Azure blobs backup

We use some
block blobs to store some durable resources and then
page blobs to store event data
We need to backup the blobs, so I tried to use AzCopy. It works ok on my dev machine, but it fails on other slow machine with error "The remote server returned an error: (412) The condition specified using HTTP conditional header(s) is not met.." almost every time.
We write to page blobs quite often (might be up to several times in a second, but this is not so common case), so this might be the reason.
Is there any better strategy how to backup the changing blobs? Or is there any way how to bypass the problem with ETag used by AzCopy?
A changed ETag will always halt a copy, since a changing ETag signifies that the source has changed.
The general approach to blob backup is subjective, but objectively:
blob copies within Azure itself, in the same region, from Storage account to Storage account, are going to be significantly faster than trying to copy a blob to an on-premises location (due to general Internet latency) or even copying from storage account to local disk on a VM.
Blobs support snapshots (which take place relatively instantly). If you create a snapshot, the snapshot remains unchanged, allowing you to then execute a copy operation against the snapshot instead of the actual blob itself (using AzCopy in your case) without fear of the source data changing during the copy. Note: You can create as many snapshots as you want; just be careful, since storage size grows as the underlying original blob changes.

Can we do manipulation with Data stored in Azure Storage directly through VM without downloading it on VM?

I had some data that is stored on Azure Storage which is in compressed form and i want to decompress it so is it possible that i could decompress it without downloading it on the Virtual Machine. I mean to say that the storage could work in the same manner as my Secondary storage device does. Ask if you need more detail.
The answer is always "depends".
If it is possible - yes. Do you really want to do it - I am not sure.
Take the Blob Storage, because I assume you store your data in a blob storage. There are two different types of Blobs - Block Blobs and Page Blobs. Either can be updated by partially updating its content.
When having a block blob you can modify it using the Put Block operation on the Storage API. When you have a page blob, you can use the Put Page operation on the Blob Service API.
Of course after modifying the content you will have to send a final request to the Blob Service API to "commit" the changes and inform the service about the new content (Put Block List for BlockBlobs and implement robust retry logic for Put Page for Page blobs).
Although technically it is possible to manipulate the content on the blob without downloading the whole file, it really brings more complications than it solves. For example - once you modify part of the content of a file, all the checksums are now broken. Moreover - if it is a compressed file, you also have to modify the header of the file. At the end - if you know the exact structure of what you saved and you know which exact parts of it you want to modify - you can do it. But I think it will be just overengineering.

Resources