Azure ZRS/GRS vs snapshots - azure

Why would I need to create a blob snapshot and incur additional cost if Azure already provides GRS(Geo redundant storage) or ZRS (Zone redundant storage)?

Redundancy (ZRS/GRS/RAGRS) provides means to achieve high availability of your resources (blobs in your scenario). By enabling redundancy you are ensuring that a copy of your blob is available in another region/zone in case primary region/zone is not available. It also ensures against data corruption of the primary blob.
When you take a snapshot of your blob, a readonly copy of that blob in its current state is created and stored. If needed, you can restore a blob from a snapshot. This scenario is well suited if you want to store different versions of the same blob.
However, please keep in mind that neither redundancy nor snapshot is backup because if you delete base blob, all the snapshots associated with that blob are deleted and all the copies of that blob available in other zones/regions are deleted as well.

I guess you need to understand the difference between Backup and Redundancy.
Backups make sure if something is lost, corrupted or stolen, that a copy of the data is available at your disposal.
Redundancy makes sure that if something fails—your computer fails, a drive gets fried, or a server freezes and you are able to work regardless of the problem. Redundancy means that all your changes are replicated to another location. In case of a failover, your slave can theoretically function as a master and serve the (hopefully) latest state of your file system.

You could also turn soft delete on. That would keep a copy of every blob for every change made to it, even if someone deletes it. Then you set the retention period for those blobs so they would be automatically removed after some period of time.
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-soft-delete

Related

How to disable snapshot in Azure Storage?

Snapshots cause a lot of cost. In some of my storage accounts I don't need them.
But I can't find a place where I can turn it off.
How can I disable snapshots completely from a storage account in Azure?
It's not a feature that can be turned off completely; Although to make snapshots you would have to explicitly write code to create them, unless you have soft delete enabled. In that case an overwrite will create a snapshot in deleted state but it'll be automatically removed once the soft delete time expires.
Another option would be the lifecycle management. There you can make a rule to automatically delete snapshots once they are more than X days old. That check runs daily so the storage costs are only extended by a few days.
Navigate to your storage account's blob and look for your snapshots under Snapshots. From there, you may manage them.
https://i.imgur.com/P2LRras.png
If you've already established a resource for it, go to that resource's page and delete it.
https://i.imgur.com/qLvFe3v.png

Are Azure Blob copy operations cheap?

Azure Blob Storage does not expose any kind of "blob rename" operation - which sounds preposterous because the idea of renaming an entity is a fundamental operation in almost any storage system - and Azure's documentation makes no reference to how a blob's name is used internally (e.g. as DHT key), but as we can specify our own names it's clear that Azure isn't using a content-addressable storage model (so renaming should be possible, once the Azure Storage team decides to allow it).
Microsoft advocates instead that to "rename" a blob, you simply copy it, then delete the original - which seems incredibly inefficient - for example, if you have a 200GB video file blob with a typo in the blob name - unless internally Azure has some kind of dedupe system - in which case it makes perfect sense to eliminate the special-case of "blob renaming" because internally it really would be a "name copy" operation.
Unfortunately the current documentation for blob copy ( https://learn.microsoft.com/en-us/rest/api/storageservices/fileservices/copy-blob ) does not describe any internal processes, and in-fact, suggests that the blob copy might be a very long operation:
State of the copy operation, with these values:
success: the copy completed successfully.
pending: the copy is in progress.
If it was using a dedupe system internally then all blob copy operations would be instantaneous so there would be no need for an "in progress" status; also confusingly it uses "pending" to refer to "in progress" - when normally "pending" means "enqueued, not starting yet".
Alarmingly, the documentation also states this:
A copy attempt that has not completed after 2 weeks times out and leaves an empty blob
...which can be taken to read that there are zero guarantees about the time it takes to copy a blob. There is nothing in the page to suggest smaller blobs are copied quicker compared to bigger blobs - so for some reason (such as a long queue, unfortunate outages, and so on) it could take 2 weeks to correct my hypothetical typo in my hypothetical 200GB video file - and don't forget that I cannot delete my original misnamed blob until the copy operation is completed - which means needing to design my client software to constantly check and eventually issue the delete operation (and to ensure my software runs continuously for up to 2 weeks...).
Is there any authoritative information regarding the runtime characteristics and nature of Azure Blob copy operations?
As you may already know that Copy Blob operation is an asynchronous operation and all the things you mentioned above are true with one caveat. The copy operation is synchronous when it comes to copying within same storage account. Even though you get the same state whether you're copying blobs across storage accounts or within a storage account but when this operation is performed in the same storage account, it happens almost instantaneously.
So when you rename a blob, you're creating a copy of the blob in the same storage account (even same container) which is instantaneous. I am not 100% sure about the internal implementation but if I am not mistaken when you copy a blob in the same storage account, it doesn't copy the bytes in some separate place. It just create 2 pointers (new blob and the old blob) pointing to the same storage data. Once you start making changes to the blobs I think at that time it goes and changes those bytes.
For internal understanding of Azure Storage, I would highly recommend that you read the paper published by the team a few years ago. Please look at my answer here which has links to this paper: Azure storage underlying technology.

Azure blobs backup

We use some
block blobs to store some durable resources and then
page blobs to store event data
We need to backup the blobs, so I tried to use AzCopy. It works ok on my dev machine, but it fails on other slow machine with error "The remote server returned an error: (412) The condition specified using HTTP conditional header(s) is not met.." almost every time.
We write to page blobs quite often (might be up to several times in a second, but this is not so common case), so this might be the reason.
Is there any better strategy how to backup the changing blobs? Or is there any way how to bypass the problem with ETag used by AzCopy?
A changed ETag will always halt a copy, since a changing ETag signifies that the source has changed.
The general approach to blob backup is subjective, but objectively:
blob copies within Azure itself, in the same region, from Storage account to Storage account, are going to be significantly faster than trying to copy a blob to an on-premises location (due to general Internet latency) or even copying from storage account to local disk on a VM.
Blobs support snapshots (which take place relatively instantly). If you create a snapshot, the snapshot remains unchanged, allowing you to then execute a copy operation against the snapshot instead of the actual blob itself (using AzCopy in your case) without fear of the source data changing during the copy. Note: You can create as many snapshots as you want; just be careful, since storage size grows as the underlying original blob changes.

automatic backups of Azure blob storage?

I need to do an automatic periodic backup of an Azure blob storage to another Azure blob storage.
This is in order to guard against any kind of malfunction in the software.
Are there any services which do that? Azure doesn't seem to have this
As #Brent mentioned in the comments to Roberto's answer, the replicas are for HA; if you deleted a blob, that delete is replicated instantly.
For blobs, you can very easily create asynchronous copies to a separate blob (even in a separate storage account). You can also make snapshots which capture a blob at a current moment in time. At first, snapshots don't cost anything, but if you start modifying the blocks/pages referred to by the snapshot, then new blocks/pages are allocated. Over time, you'll want to start purging your snapshots. This is a great way to keep data "as-is" over time and revert back to a snapshot if there's a malfunction in your software.
With queues, the malfunction story isn't quite the same, as typically you'd only have a small number of queue items present (at least that's the hope; if you have thousands of queue messages, this is typically a sign that your software is falling behind). In any event: You could, when writing queue messages, write your queue messages to blob storage, for archive purposes, in case there's a malfunction. I wouldn't recommend using blob- based messaging for scaling/parallel processing, since they don't have the mechanisms in place that queues do, but you could use them manually in case of malfunction.
There's no copy function for tables. You'd need to write to two tables during your write.
Azure keeps 3 redundant copies of your data in different locations in the same data centre where your data is hosted (to guard against hardware failure).
This applies to blob, table and queue storage.
Additionally, You can enable geo-replication on all of your storage. Azure will automatically keep redundant copies of your data in separate data centres. This guards against anything happening to the data centre itself.
See Here

windows azure virtual machine hard disk backups

When using Virtual Servers on Windows Azure and attaching disks - Would these go down if the hard disk / server should fail? Or are they guaranteed to be backed up? Should we still keep an external backup ourselves, or can we depend on Azure?
Also, would the primary hard-disk as part of the server, also follow the same backup principles? And same applies for blob-storage, would that need to be backed up?
Azure attached disks, just like the OS disk, is stored as a vhd, inside a blob in Azure Storage. This is durable storage: triple-replicated within the data center, and optionally geo-replicated to a neighboring data center.
That said: If you delete something, it's instantly deleted. No going back. So... then the question is, how to deal with backups from an app-level perspective. To that end:
You can make snapshots of the blob storing the vhd. A snapshot is basically a linked list of all the pages in use. In the event you make changes, then you start incurring additional storage, as additional page blobs are allocated. The neat thing is: you can take a snapshot, then in the future, copy the snapshot to its own vhd. Basically it's like rolling backups with little space used, in the event you only add data and don't modify.
You can make blob copies. This is an async action that is near-instant within a single data center. I've seen copies take upwards of an hour going cross-data center. But the point is that you can make copies any time you want, and this will be in its own blob.

Resources