Say I need to retrieve 20 thumbnail images from Azure BLOB after a button click. I've read that blobs are accessed like so http://<storage account>.blob.core.windows.net/<container>/<blob>
A single GetBlob() request is charged at 1 transaction. Is this to say getting 20 images will cost, at a minimum, 20 transactions?
Is there a way to send a batch request such that it retrieves those images and is billed at 1 transaction?
I've read about Entity Group Transactions, but it sounded to me they are for Azure Table only.
There's nothing akin to Entity Group Transactions with blobs. Each is accessed individually, burning at least one transaction (depending on blob size).
At a penny per 10,000100,000 transactions, this will likely not be a major cost factor unless you're constantly downloading blobs. In that case, it might be worth considering some type of cache, to prevent excessive activity against Blob Storage.
One other workaround (hack?): If you're always grabbing a set of related blobs, you could store that related collection in a zip file, in a single blob. Not saying I'm in favor of this, but if you need to save transactions, at least it's an option (aside from cache).
Take a look at this MSDN article, which describes storage and how partitions related to blobs and tables (scroll down to the Partitions section). The pertinent info for you: Each blob is in its own partition. With table storage, you're able to perform atomic actions on entities within a single partition (there are no atomic actions across multiple partitions). This is why you don't see atomic operations across multiple blobs.
Related
I have studied the following link to understand the Hot, Cool and Archive tiers of Azure Storage V2.
https://learn.microsoft.com/en-us/azure/storage/blobs/storage-blob-storage-tiers
In the Blob rehydration section it says:
To read data in archive storage, you must first change the tier of the blob to hot or cool. This process is known as rehydration and can take up to 15 hours to complete.
My questions are:
Can I get just list of all blobs without rehydration? Is it going to cost me?
Do I have to perform rehydration before reading/deleting a single file?
Do I have to perform rehydration to delete a file before 180 days?
All answers are taken from the article you linked to:
1) Yes, you can get a list and it will not cost you extra
2) Yes, you have to rehydrate to read file contents, but you can delete without rehydrating
While a blob is in archive storage, the blob data is offline and cannot be read, copied, overwritten, or modified. You can't take snapshots of a blob in archive storage. However, the blob metadata remains online and available, allowing you to list the blob and its properties. For blobs in archive, the only valid operations are GetBlobProperties, GetBlobMetadata, ListBlobs, SetBlobTier, and DeleteBlob.
As an addition to the answer to the reading part of question 2):
Blob-level tiering allows you to change the tier of your data at the object level using a single operation called Set Blob Tier. You can easily change the access tier of a blob among the hot, cool, or archive tiers as usage patterns change, without having to move data between accounts. All tier changes happen immediately. However, rehydrating a blob from archive can take several hours.
3) The 180 days are the minimum amount of time a blob needs to be in archive storage. Changes before that period incur an early deletion charge. This does not change the way you delete blobs, so you can still call DeleteBlob (and be charged the early deletion charge).
Any blob that is deleted or moved out of the cool (GPv2 accounts only) or archive tier before 30 days and 180 days respectively will incur a prorated early deletion charge.
Here is the problem. I have the devices pushing telemetry messages to Azure IoT hub and currently, I save all messages to the Table Storage with partition key device Id and row key telemetry kind. What I want to do is restrict the size of stored data. For instance, the table should keep only up to 50 MB and the should be cleared. What kind of storage should I use for such use case and what are the benefits? Any suggestions are highly appreciated.
Neither Azure Tables nor Azure Blobs have the feature where the content automatically gets deleted after a certain size is reached. In fact, I don't think I have come across any cloud storage solution that offers it (I've seen the data gets automatically deleted based on age).
Thus if you want to delete the data once it reaches a certain size, you will have to write some code and schedule it (using either Functions or WebJobs). That code will find the size occupied and delete the data going over the limit.
Between Blobs and Tables, I am somewhat conflicted. With Blobs, it is much easier to get the storage consumed - You just list the blobs in a container and sum up the size of the blobs. With tables, you will need to keep on fetching entities (i.e. download the data) and calculate the size of that data. But then deleting data from tables is easier as you will be deleting rows (unless you store each record in a separate blob).
If it were not on the data size and rather based on the data age, I would have recommended Cosmos DB. Though more expensive than Azure Storage, but you could define TTL at the collection level and based on that policy, the documents will be automatically deleted.
In our service, we are using SQL Azure as the main storage, and Azure table for the backup storage. Everyday about 30GB data is collected and stored to SQL Azure. Since the data is no longer valid from the next day, we want to migrate the data from SQL Azure to Azure table every night.
The question is.. what would be the most efficient way to migrate data from Azure to Azure table?
The naive idea i came up with is to leverage the producer/consumer concept by using IDataReader. That is, first get a data reader by executing "select * from TABLE" and put data into a queue. At the same time, a set of threads are working to grab data from the queue, and insert them into Azure Table.
Of course, the main disadvantage of this approach (i think) is that we need to maintain the opened connection for a long time (might be several hours).
Another approach is to first copy data from SQL Azure table to local storage on Windows Azure, and use the same producer/consumer concept. In this approach we can disconnect the connection as soon as the copy is done.
At this point, i'm not sure which one is better, or even either of them is a good design to implement. Could you suggest any good design solution for this problem?
Thanks!
I would not recommend using local storage primarily because
It is transient storage.
You're limited by the size of local storage (which in turn depends on the size of the VM).
Local storage is local only i.e. it is accessible only to the VM in which it is created thus preventing you from scaling out your solution.
I like the idea of using queues, however I see some issues there as well:
Assuming you're planning on storing each row in a queue as a message, you would be performing a lot of storage transactions. If we assume that your row size is 64KB, to store 30 GB of data you would be doing about 500000 write transactions (and similarly 500000 read transactions) - I hope I got my math right :). Even though the storage transactions are cheap, I still think you'll be doing a lot of transactions which would slow down the entire process.
Since you're doing so many transactions, you may get hit by storage thresholds. You may want to check into that.
Yet another limitation is the maximum size of a message. Currently a maximum of 64KB of data can be stored in a single message. What would happen if your row size is more than that?
I would actually recommend throwing blob storage in the mix. What you could do is read a chunk of data from SQL table (say 10000 or 100000 records) and save that data in blob storage as a file. Depending on how you want to put the data in table storage, you could store the data in CSV, JSON or XML format (XML format for preserving data types if it is needed). Once the file is written in blob storage, you could write a message in the queue. The message will contain the URI of the blob you've just written. Your worker role (processor) will continuously poll this queue, get one message, fetch the file from blob storage and process that file. Once the worker role has processed the file, you could simply delete that file and the message.
I need to do an automatic periodic backup of an Azure blob storage to another Azure blob storage.
This is in order to guard against any kind of malfunction in the software.
Are there any services which do that? Azure doesn't seem to have this
As #Brent mentioned in the comments to Roberto's answer, the replicas are for HA; if you deleted a blob, that delete is replicated instantly.
For blobs, you can very easily create asynchronous copies to a separate blob (even in a separate storage account). You can also make snapshots which capture a blob at a current moment in time. At first, snapshots don't cost anything, but if you start modifying the blocks/pages referred to by the snapshot, then new blocks/pages are allocated. Over time, you'll want to start purging your snapshots. This is a great way to keep data "as-is" over time and revert back to a snapshot if there's a malfunction in your software.
With queues, the malfunction story isn't quite the same, as typically you'd only have a small number of queue items present (at least that's the hope; if you have thousands of queue messages, this is typically a sign that your software is falling behind). In any event: You could, when writing queue messages, write your queue messages to blob storage, for archive purposes, in case there's a malfunction. I wouldn't recommend using blob- based messaging for scaling/parallel processing, since they don't have the mechanisms in place that queues do, but you could use them manually in case of malfunction.
There's no copy function for tables. You'd need to write to two tables during your write.
Azure keeps 3 redundant copies of your data in different locations in the same data centre where your data is hosted (to guard against hardware failure).
This applies to blob, table and queue storage.
Additionally, You can enable geo-replication on all of your storage. Azure will automatically keep redundant copies of your data in separate data centres. This guards against anything happening to the data centre itself.
See Here
It's quite a topic, blobs vs tables vs SQL, and despite all I read so far I still can't find some proper reasoning on what to use when.
We have a multi-tenant SaaS web-application which we are about to move to Azure. We use an SQL Server 2008 database. We store documents and log information that belongs to the documents. Kinda like dropbox does.
The forums state that you better use Azure Tables when you are considering "large" objects. We typically store hundreds of documents per user where the size of the documents vary from 5kb to 30mb where the vast majority will be around 1MB?
Are there some ground rules when to go for Blobs, Tables, Sql? I already learned that I shouldn't store my documents in SQL since it is too expensive. But when does it get "beneficial" to store the documents in Blobs and when would I be better of with tables? Is there some kind of formula like :
if (objects * MB/object * objectrequested > y) then blobs, else tables
I think Igorek has addressed your SQL Azure concerns. You seem to still have questions about Tables vs Blobs, though.
In your case using Table storage would be annoying. Each property/column in ATS can be at most 64KB, so you would have to split the documents across multiple properties and then reassemble them. There is also a limit of 4MB per entity, which would be a problem. Blob storage has neither of these limitations.
I would tend to use Azure Table Storage when you have smallish entities with many properties that need to be stored and queried separately. So it works wells for stored objects, or small documents with lots of metadata.
Blob storage works better for things without a ton of metadata. It's good for things that might work well as files on a filesystem.
I would store documents themselves in the Azure Blob storage (not table storage). Outside of the fact that it is pretty expensive to store documents in a SQL Azure database that charges a penny per meg (or less depending on volume), SQL database is generally not a good place for documents. SQL is a relational database that provides benefits of ability to do queries, joins, etc. There is usually no benefit to storing large documents or images in a SQL database, especially when there is a highly scalable central storage system that is pretty cheap to store/access.
Now, if you need to search thru the documents themselves, I'd use something like Lucene.NET to provide a search capability for document-based repository.
HTH