I am trying to benchmark search/read & insert queries on an ATS which is small size(500 entities). Average insert time is 400ms and average search + retrieve time is 190ms.
When inserting, I am querying on the partition key and the condition itself is only composed of one predicate : [PartitionKey] eq <value> (no more ands/ors). Also, I am returning only 1 property.
What could be the reason for such results?
Search code:
TableQuery<DynamicTableEntity> projectionQuery = new TableQuery<DynamicTableEntity>().Select(new string[] { "State" });
projectionQuery.Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, "" + msg.PartitionKey));
// Define an entity resolver to work with the entity after retrieval.
EntityResolver<bool?> resolver = (pk, rk, ts, props, etag) => props.ContainsKey("State") ? (props["State"].BooleanValue) : null;
Stopwatch sw = new Stopwatch();
sw.Start();
List<bool?> sList = table.ExecuteQuery(projectionQuery, resolver, null, null).ToList();
sw.Stop();
Insert Code:
CloudTable table = tableClient.GetTableReference("Messages");
TableOperation insertOperation = TableOperation.Insert(msg);
Stopwatch sw = new Stopwatch();
// Execute the insert operation.
sw.Start();
table.Execute(insertOperation);
sw.Stop();
You can refer to this post for possible performance issues: Microsoft Azure Storage Performance and Scalability Checklist.
The reason why you can only get one property is you're using EntityResolver, please try to remove that. Refer to Windows Azure Storage Client Library 2.0 Tables Deep Dive for the usage of EntityResolver - when you should use it and how to use it correctly.
From the SLA Document:
Storage
We guarantee that at least 99.99% of the time, we will successfully
process requests to read data from Read Access-Geo Redundant Storage
(RA-GRS) Accounts, provided that failed attempts to read data from the
primary region are retried on the secondary region.
We guarantee that at least 99.9% of the time, we will successfully process requests to read data from Locally Redundant Storage (LRS),
Zone Redundant Storage (ZRS), and Geo Redundant Storage (GRS)
Accounts.
We guarantee that at least 99.9% of the time, we will successfully process requests to write data to Locally Redundant Storage (LRS),
Zone Redundant Storage (ZRS), and Geo Redundant Storage (GRS) Accounts
and Read Access-Geo Redundant Storage (RA-GRS) Accounts.
And also from there refereed document:
Table Query / List Operations
Maximum Processing Time: Ten (10)
seconds (to complete processing or return a continuation)
There is no commitment for fast / low response time. Nor are there any commitments on being faster with smaller tables.
Related
Although I am setting high RUs, I am not getting required results.
Background is: I am working on IOT application and unfortunately partition key set is very bad {deviceID}+ {dd/mm/yyyy hh:mm:sec:}, which means technically speaking each logical partition would have very less items (never reach 10 GB limit), but I feel there is a huge number of physical partitions got created which is forcing my RUs to split. How do I get physical partition list
you cant control partitions, nor you can get a partition list. but you dont actually need them. its not like each partition will be placed on a separate box. if you are suffering from low performance you need to identify what is causing throttling. You can use the metrics blade to identify throttled partitions and figure out why those are throttled. You can also use diagnostic settings and stream those to Log Analytics to gain additional insights
We can get the list of partition key ranges using this API. Partition Key Ranges might change in future with changes in data.
Physical partitions are internal implementations. We don't have any control over the size or number of physical partitions and we can't control the mapping between logical & physical partitions.
But we can control the distribution of data over logical partitions by choosing appropriate Partition Key which can spread data evenly across multiple logical partitions.
This information used to be displayed straight forwardly in the portal but this was removed in a redesign.
I feel that this is a mistake as provisoning RU requires knowledge of peak RU per partition multiplied by number of partitions so this number should be easily accessible.
The information is returned in the JSON returned to the portal but not shown to us. For collections provisioned with dedicated throughput (i.e. not using database provisioned throughput) this javascript bookmark shows the information.
javascript:(function () { var ss = ko.contextFor($(".ext-quickstart-tabs-left-margin")[0]).$rawData().selectedSection(); var coll = ss.selectedCollectionId(); if (coll === null) { alert("Please drill down into a specific container"); } else { alert("Partition count for container " + coll + " is " + ss.selectedCollectionPartitionCount()); } })();
Visit the metrics tab in the portal and select the database and container and then run the bookmark to see the count in an alert box as below.
You can also see this information from the pkranges REST end point. This is used by the SDK. Some code that works in the V2 SDK is below
var documentClient = new DocumentClient(new Uri(endpointUrl), authorizationKey,
new ConnectionPolicy {
ConnectionMode = ConnectionMode.Direct
});
var partitionKeyRangesUri = UriFactory.CreatePartitionKeyRangesUri(dbName, collectionName);
FeedResponse < PartitionKeyRange > response = null;
do {
response = await documentClient.ReadPartitionKeyRangeFeedAsync(partitionKeyRangesUri,
new FeedOptions {
MaxItemCount = 1000
});
foreach(var pkRange in response) {
//TODO: Something with the pkRange
}
} while (!string.IsNullOrEmpty(response.ResponseContinuation));
I have a partitioned collection that uses a 5-digit membership code as its partition key. There could be thousands of partition keys in the collection.
I upserted about 32K documents in it. Using the Partition Stats sample:
Summary:
partitions: 1
documentsCount: 32,190
documentsSize: 0.045 GB
But there is only a single physical partition! If I use the portal metrics, i see a similar thing:
Doesn't this mean that all my queries are going against a single physical partition? When does Cosmos add more physical partitions?
The reason I ask is because I am seeing really poor performance that seriously deteriorates when I load test. For example this simple count method starts off fast in a light test and then takes seconds when the system is under pressure (ignore the handler stuff):
private async Task<int> RetrieveDigitalMembershipRefreshItemsCount(string code, string correlationId)
{
var error = "";
double cost = 0;
var handler = HandlersFactory.GetProfilerHandler(_loggerService, _settingService);
handler.Start(LOG_TAG, "RetrieveDigitalMembershipRefreshItemsCount", correlationId);
try
{
if (this._docDbClient == null)
throw new Exception("No singleton DocDb client!");
// Check to see if there is a URL
if (string.IsNullOrEmpty(_docDbDigitalMembershipsCollectionName))
throw new Exception("No Digital Memberships collection defined!");
FeedOptions queryOptions = new FeedOptions { MaxItemCount = 1, PartitionKey = new Microsoft.Azure.Documents.PartitionKey(code.ToUpper()) };
return await _docDbClient.CreateDocumentQuery<DigitalMembershipDeviceRegistrationItem>(
UriFactory.CreateDocumentCollectionUri(_docDbDatabaseName, _docDbDigitalMembershipsCollectionName), queryOptions)
.Where(e => e.CollectionType == DigitalMembershipCollectionTypes.RefreshItem && e.Code.ToUpper() == code.ToUpper())
.CountAsync();
}
catch (Exception ex)
{
error = ex.Message;
throw new Exception(error);
}
finally
{
handler.Stop(error, cost, new
{
Code = code
});
}
}
Here is the log for this method as the test progresses ordered by the highest duration. Initially it takes only a few milliseconds:
I tried most of the performance tips i.e. direct mode, same region, singleton. Any help would be really appreciated.
Thanks.
Partition Key is used for logical partition, distributing data across physical partitions. Physical partition management is managed by Azure Cosmos DB.
Auto-split of physical partition can be achieved, as long as the initial container was created with at least 1000 RU/s of throughput and a partition key is specified. The split mainly distributes logical partitions in one physical partition to different physical partitions. And the process is transparent to us.
Two scenarios for physical parition:
Provision throughput higher than settings value, Azure Cosmos DB splits one or more of your physical partitions to support the higher throughput.
Physical partition p reaches its storage limit, Azure Cosmos DB seamlessly splits p into two new physical partitions. If p only has one logical partition inside, split won't occur.
So if those conditions are not met, you only have one single physical partition. But the queries are going against specified logical partition using partition key.
For more details, please refer to Azure Cosmos DB partition.
What are some ways to optimize the retrieval of large numbers of entities (~250K) from a single partition from Azure Table Storage to a .NET application?
As far as I know, there are two ways to optimize the retrieval of large numbers of entities from a single partition from Azure Table Storage to a .NET application.
1.If you don’t need to get all properties of the entity, I suggest you could use server-side projection.
A single entity can have up to 255 properties and be up to 1 MB in size. When you query the table and retrieve entities, you may not need all the properties and can avoid transferring data unnecessarily (to help reduce latency and cost). You can use server-side projection to transfer just the properties you need.
From:Azure Storage Table Design Guide: Designing Scalable and Performant Tables(Server-side projection)
More details, you could refer to follow codes:
string filter = TableQuery.GenerateFilterCondition(
"PartitionKey", QueryComparisons.Equal, "Sales");
List<string> columns = new List<string>() { "Email" };
TableQuery<EmployeeEntity> employeeQuery =
new TableQuery<EmployeeEntity>().Where(filter).Select(columns);
var entities = employeeTable.ExecuteQuery(employeeQuery);
foreach (var e in entities)
{
Console.WriteLine("RowKey: {0}, EmployeeEmail: {1}", e.RowKey, e.Email);
}
2.If you just want to show the table’s message, you needn’t to get all the entities at same time.
You could get part of the result.
If you want to get the other result, you could use the continuation token.
This will improve the table query performance.
A query against the table service may return a maximum of 1,000 entities at one time and may execute for a maximum of five seconds. If the result set contains more than 1,000 entities, if the query did not complete within five seconds, or if the query crosses the partition boundary, the Table service returns a continuation token to enable the client application to request the next set of entities. For more information about how continuation tokens work, see Query Timeout and Pagination.
From:Azure Storage Table Design Guide: Designing Scalable and Performant Tables(Retrieving large numbers of entities from a query)
By using continuation tokens explicitly, you can control when your application retrieves the next segment of data.
More details, you could refer to follow codes:
string filter = TableQuery.GenerateFilterCondition(
"PartitionKey", QueryComparisons.Equal, "Sales");
TableQuery<EmployeeEntity> employeeQuery =
new TableQuery<EmployeeEntity>().Where(filter);
TableContinuationToken continuationToken = null;
do
{
var employees = employeeTable.ExecuteQuerySegmented(
employeeQuery, continuationToken);
foreach (var emp in employees)
{
...
}
continuationToken = employees.ContinuationToken;
} while (continuationToken != null);
Besides, I suggest you could pay attention to the table partition scalability targets.
Target throughput for single table partition (1 KB entities) Up to 2000 entities per second
If you reach the scalability targets for this partition, the storage service will throttle.
In my Table storage, there are 10.000 elements per partition. Now I would like to load a whole partition into memory. However, this is taking very long. I was wondering if I am doing something wrong, or if there is a way to do this faster. Here is my code:
public List<T> GetPartition<T>(string partitionKey) where T : TableServiceEntity
{
CloudTableQuery<T> partitionQuery = (from e in _context.CreateQuery<T>(TableName)
where e.PartitionKey == partitionKey
select e).AsTableServiceQuery<T>();
return partitionQuery.ToList();
}
Is this the way it is supposed to be done or is their anything equivalent to the batch insertion for getting elements out of the table again?
Thanks a lot,
Christian
EDIT
We have all the data also available in blob storage. That means, one partition is serialized completely as byte[] and saved in a blob. When I retrieve that from blob storage and afterwards deserialize it, it is way faster than taking it from the table. Almost 10 times faster! How can this be?
In your case I think turning off change tracking could make a difference:
context.MergeOption = MergeOption.NoTracking;
Take a look on MSDN for other possible improvements: .NET and ADO.NET Data Service Performance Tips for Windows Azure Tables
Edit: To answer your question why a big file in blob storage is faster, you have to know that the max amount of records you can get in a single request is 1000 items. This means, to fetch 10.000 items you'll need to do 10 requests instead of 1 single request on blob storage. Also, when working with blob storage you don't go through WCF Data Services which can also have a big impact.
In addition, make sure you are on the second generation of Azure Storage...its essentially a "no cost" upgrade if you are in a data center that supports it. It uses SSDs and an upgraded network Topology.
http://blogs.msdn.com/b/windowsazure/archive/2012/11/02/windows-azure-s-flat-network-storage-and-2012-scalability-targets.aspx
Microsoft will not migrate your account, simply just re-create it and you get "upgraded for FREE" to the 2nd gen Azure Storage.
Is there an equivalent to TransactionScope that you can use with Azure Table Storage?
what I'm trying to do is the following:
using (TransactionScope scope = new TransactionScope) {
account.balance -= 10;
purchaseOrders.Add(order);
accountDataSource.SaveChanges();
purchaseOrdersDataSource.SaveChanges();
scope.Complete();
}
If for some reason saving the account works, but saving the purchase order fails, I don't want the account to decrement the balance.
Within a single table and single partition, you may write multiple rows in an entity group transaction. There's no built-in transaction mechanism when crossing partitions or tables.
that said: remember that tables are schema-less, so if you really needed a transaction, you could store both your account row and your purchase order row in the same table, same partition, and do a single (transactional) save.