Azure billing is based on the size of used space. Now I need to know the details. What is the size of each storage object in my storage (blob container, single table)?
It's easy to write a code that enumerates all blobs and calculates the overall size per container. But what about tables? How can I get the size of a certain table in Azure storage?
If you're not interested in getting a breakup by blob container, you don't have to write the code as far as finding the blob storage size is concerned. This information is available to you via storage analytics (http://msdn.microsoft.com/en-us/library/windowsazure/hh343270.aspx). The table of interest to you would be $MetricsCapacityBlob (http://msdn.microsoft.com/en-us/library/windowsazure/hh343264.aspx).
Coming to tables, unfortunately no such thing is available and you would need to fetch all entities and calculate the size of each entity to find the table size. You may find this blog post useful in calculating the size of an entity: http://blogs.msdn.com/b/avkashchauhan/archive/2011/11/30/how-the-size-of-an-entity-is-caclulated-in-windows-azure-table-storage.aspx.
HTH.
There is a tool which can get table size or entities count for you. Azure Storage Manager
Select a storage table in left tree pane
Click 'Property' button
Click 'Calc' button on the table properties dialog
Wait a few moment, till 'Calc' button becomes available again.
Here's the Step by Step of how to get this info:
Go into "Monitor" in Azure (it's a top level item on the left nav by default), it looks like a speedometer, or perhaps a really fast one handed clock.
Then select Metrics (it's below Alerts, and above Logs, in the first grouping).
Then from the "Select a scope" pop-up, select your storage account and pressed "Apply".
Then on the empty table there are some drop downs, the first one will have the scope you applied. The second one, Metric Namespace, should be "Table", the third one, Metric, should be "Table Capacity", you can leave the last one as Avg -- if you only have one table in your storage account then the Avg will just be the exact size of that table.
If you want to calculate the average row size, you can do a simple divide -- in my case I did 1.4 GB / 2.5M entities = ~560 bytes
Related
I am looking at the billing for my Azure Storage Account and trying to understand managing its cost.
Currently my blobs cost is mostly under the "All Other Operations" category. Is there a way to see what operations these are?
I would like to reduce this cost, so the goal is to update my app so these operations are performed less, but I need to first identify what they are.
Below is the graph I get from cost analysis. (Storage accounts, Accumulated cost, grouped by meter)
After a support call with Azure, they pointed me to some of the (somewhat hidden) tracing capabilities.
First and easiest is to check the type of transactions.
Go to the storage account > Metrics
Select Transactions as the metric
Click Add Filter and select API Name as the property
Select the API names you think are the suspects
Unfortunately selecting multiple doesn't show them separately, so you have to try every API individually and see if anything sticks out.
Second option is to enable Diagnostics Logging for the storage type you're interested in.
If the above doesn't yield any good results, or you're curious about the exact calls at exact times .etc. you can enable this feature, and wait for logs to be collected, usually over a few days, so you have a good sample set to reason.
Go to the storage account > Diagnostic settings (classic).
This is under Monitoring (classic) doesn't seem to have a replacement in the new Monitoring section.
Enable logging and metrics type (hour or minute)
Click Save
These logs are written to a blob storage in the same account to a container named $logs. According to docs, this container cannot be deleted after enabling, but the content can be deleted when you're done.
Note that if your storage account gets a lot of traffic, this log can get very big very quick. You're charged the same rates for reads, writes and storage in this container as usual, including the log writes the platform does when these settings are enabled.
See documentation here
After setting this up, give it some time to collect data.
Use storage explorer or other means to navigate and download the logs and inspect them.
Logs contain every request made to storage, with details such as timestamp, API name, result, whether the operation was authenticated, and if you're looking at blobs it also shows the url, the user agent and more.
(turns out my app did close to 100,000 calls to GetBlobProperties and GetContainerProperties per day🎈)
Short answer to your question is Yes.
Analysis:
According to my observation, I get "all other operations" when I group by "Meter" as shown in below screenshot.
And then if I export the results by clicking on "Export" and then when I filter the results for "Meter" column with "all other operations" then I observe that column named "ServiceTier" has "tiered block blob" as value (in my case). For reference, see below screenshot.
And then if I group by "Meter subcategory" as shown in below screenshot then I see "tiered block blob" (in my case).
And then if I export the results by clicking on "Export" and then when I filter the results for "Meter subCategory" column with "tiered block blob" then I observe that column named "ServiceTier" also has "tiered block blob". For reference, see below screenshot.
So based on above analysis, I believe that we can figure out the break down of "Meter" column with "all other operations" as "tiered block blob" in my case with the help of "Meter subcategory" and "ServiceTier". Similarly you would be able to figure out the break down of "Meter" column with "all other operations".
Hope this helps! Cheers!
Other related references: As per this and this Azure documents, there are many other operations on blobs excluding write, read, list operations so in your case any such operations might have fallen under "all other operations" category.
we're using a Stream Analytics component in Azure to send data (log messages from different web apps) to a table storage account. The messages are retrieved from an Event Hub, but I think this doesn't matter here.
Within the Stream Analytics component we defined an output for the table storage account including partition and row key settings. As of now the partition key will be the name of the app that sent the log message in the first place. This might not be ideal, but I'm lacking experience in choosing the right values here. However, I think this is a whole different topic. The row key will be a unique id of the specific log message.
Now when I watch the Stream Analytics Output within the Azure portal the following warning message pops up very frequently (and sometimes disappears for a couple of seconds):
Warning: Output contains multiple rows and just one row per partition key. If the output latency is higher than expected, consider choosing a partition key that splits output into multiple partitions while maintaining about 100 records per partition.
Regarding this message I have two questions:
What does this exactly mean or why does it happen? I can see that a single new log message will always qualify as "just one row per partition key", simply because it's just one row. But looking at maybe hundreds of rows sent within a short period of time they all share just three partition keys (three apps logging to the Event Hub), pretty much equally divided. That's why I don't get the whole "Output contains multiple rows and just one row per partition key" thing.
Does this in any way affect the performance or overall functionality of the Stream Analytics component or the table storage?
I also played with the "Batch size" setting of the table storage output, but this didn't change anything.
Thanks in advance for reading and trying to help.
What does this exactly mean or why does it happen?
It is a warning not a error. It means that each row in your output has the unique partition key.
I can see that a single new log message will always qualify as "just one row per partition key", simply because it's just one row.
The warning is not suitable for a single message. I suggest you post a feedback on Azure feedback site which is used for accepting user voice and bugs.
https://feedback.azure.com/forums/34192--general-feedback
Does this in any way affect the performance or overall functionality of the Stream Analytics component or the table storage?
No, you could just ignore the warning.
In my Table storage, there are 10.000 elements per partition. Now I would like to load a whole partition into memory. However, this is taking very long. I was wondering if I am doing something wrong, or if there is a way to do this faster. Here is my code:
public List<T> GetPartition<T>(string partitionKey) where T : TableServiceEntity
{
CloudTableQuery<T> partitionQuery = (from e in _context.CreateQuery<T>(TableName)
where e.PartitionKey == partitionKey
select e).AsTableServiceQuery<T>();
return partitionQuery.ToList();
}
Is this the way it is supposed to be done or is their anything equivalent to the batch insertion for getting elements out of the table again?
Thanks a lot,
Christian
EDIT
We have all the data also available in blob storage. That means, one partition is serialized completely as byte[] and saved in a blob. When I retrieve that from blob storage and afterwards deserialize it, it is way faster than taking it from the table. Almost 10 times faster! How can this be?
In your case I think turning off change tracking could make a difference:
context.MergeOption = MergeOption.NoTracking;
Take a look on MSDN for other possible improvements: .NET and ADO.NET Data Service Performance Tips for Windows Azure Tables
Edit: To answer your question why a big file in blob storage is faster, you have to know that the max amount of records you can get in a single request is 1000 items. This means, to fetch 10.000 items you'll need to do 10 requests instead of 1 single request on blob storage. Also, when working with blob storage you don't go through WCF Data Services which can also have a big impact.
In addition, make sure you are on the second generation of Azure Storage...its essentially a "no cost" upgrade if you are in a data center that supports it. It uses SSDs and an upgraded network Topology.
http://blogs.msdn.com/b/windowsazure/archive/2012/11/02/windows-azure-s-flat-network-storage-and-2012-scalability-targets.aspx
Microsoft will not migrate your account, simply just re-create it and you get "upgraded for FREE" to the 2nd gen Azure Storage.
My company is interested in using the azure storage tables. They have asked me to look into access times but so far I have not found any information on this. I have a few questions that perhaps some person here could help answer.
Any information / links or anything on the read / write access times of azure table storage
If I use a partition key and row key for direct access does read time increase with number of fields
Is anyone aware of future plans for azure storage such as decrease in price, increase in access speed, ability to index or increase in size of storage per row
Storage is I understand 1MByte / row. Does this include space for the field names. I assume it does
Is there any way to determine how much space is used for a row in Azure storage. Any API for this.
Hope someone can help answer even one or two of these questions.
PLEASE note this question only applies to TABLE STORAGE.
Thanks
Microsoft has a blog post about scalability targets.
For actual storage per row, here's an excerpt from that post:
Entity (Row) – Entities (an entity is
analogous to a "row") are the basic
data items stored in a table. An
entity contains a set of properties.
Each table has two properties,
“PartitionKey and RowKey”, which form
the unique key for the entity. An
entity can hold up to 255 properties
Combined size of all of the properties
in an entity cannot exceed 1MB. This
size includes the size of the property
names as well as the size of the
property values or their types.
You should see performance around 500 transactions per second, on a given partition.
I know of no plans to reduce storage cost. It's currently at $0.15 / GB / month.
You can optimize table storage write speed by combining writes within a single partition - this is an entity group transaction. See here for more detail.
To add to David's answer. The Microsoft Extreme Computing Group have a pretty comprehensive series of performance benchmarks on all things Azure, including Azure tables.
From the above benchmarks (under read latency):
Entity size does not significantly affect the latencies
So I wouldn't be overly concerned about adding more properties.
Secondary indexes on Azure Tables have come up as a requested feature since it was first release and at one point it was even talked about as if it was going to be in an upcoming release. MS has since fallen very quiet about it. I understand that MS are working on it (or at the very least thinking very hard about it), but there is no time frame for when/if it will be released.
I'm currently trying to store a fairly large and dynamic data set.
My current design is tending towards a solution where I will create a new table every few minutes - this means every table will be quite compact, it will be easy for me to search my data (I don't need everything in one table) and it should make it easy for me to delete stale data.
I've looked and I can't see any documented limits - but I wanted to check:
Is there any limit on the number of tables allowed within one Azure storage account?
Or can I keep adding potentially thousands of tables without any concern?
There are no published limits to the number of tables, only the 100TB 500TB limit on a given storage account. Combined with partition+row, it sounds like you'll have a direct link to your data without running into any table-scan issues.
This MSDN article explicitly calls out: "You can create any number of tables within a given storage account, as long as each table is uniquely named." Have fun!