Azure Table Query Limits with ExecuteQuerySegmentedAsync vs ExecuteQuery - azure

What are the limits of calling ExecuteQuery()? For example, limits on the number of entities and download size.
In other words, when will the method below hit its limits?
private static void ExecuteSimpleQuery(CloudTable table, string partitionKey, string startRowKey, string endRowKey)
{
try
{
// Create the range query using the fluid API
TableQuery<CustomerEntity> rangeQuery = new TableQuery<CustomerEntity>().Where(
TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionKey),
TableOperators.And,
TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.GreaterThanOrEqual, startRowKey),
TableOperators.And,
TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.LessThanOrEqual, endRowKey))));
foreach (CustomerEntity entity in table.ExecuteQuery(rangeQuery))
{
Console.WriteLine("Customer: {0},{1}\t{2}\t{3}", entity.PartitionKey, entity.RowKey, entity.Email, entity.PhoneNumber);
}
}
catch (StorageException e)
{
Console.WriteLine(e.Message);
Console.ReadLine();
throw;
}
}
The method below is using ExecuteQuerySegmentedAsync with TakeCount of 50, but How the 50 is determined, which is I think determined by the my questions above.
private static async Task PartitionRangeQueryAsync(CloudTable table, string partitionKey, string startRowKey, string endRowKey)
{
try
{
// Create the range query using the fluid API
TableQuery<CustomerEntity> rangeQuery = new TableQuery<CustomerEntity>().Where(
TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionKey),
TableOperators.And,
TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.GreaterThanOrEqual, startRowKey),
TableOperators.And,
TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.LessThanOrEqual, endRowKey))));
// Request 50 results at a time from the server.
TableContinuationToken token = null;
rangeQuery.TakeCount = 50;
int segmentNumber = 0;
do
{
// Execute the query, passing in the continuation token.
// The first time this method is called, the continuation token is null. If there are more results, the call
// populates the continuation token for use in the next call.
TableQuerySegment<CustomerEntity> segment = await table.ExecuteQuerySegmentedAsync(rangeQuery, token);
// Indicate which segment is being displayed
if (segment.Results.Count > 0)
{
segmentNumber++;
Console.WriteLine();
Console.WriteLine("Segment {0}", segmentNumber);
}
// Save the continuation token for the next call to ExecuteQuerySegmentedAsync
token = segment.ContinuationToken;
// Write out the properties for each entity returned.
foreach (CustomerEntity entity in segment)
{
Console.WriteLine("\t Customer: {0},{1}\t{2}\t{3}", entity.PartitionKey, entity.RowKey, entity.Email, entity.PhoneNumber);
}
Console.WriteLine();
}
while (token != null);
}
catch (StorageException e)
{
Console.WriteLine(e.Message);
Console.ReadLine();
throw;
}
}
Examples are from the link below:
https://github.com/Azure-Samples/storage-table-dotnet-getting-started

For ExecuteQuerySegmentedAsync, the limit is 1000. This is based on the limitation posed by REST API where a single request to table service can return a maximum of 1000 entities (Ref: https://learn.microsoft.com/en-us/rest/api/storageservices/query-timeout-and-pagination).
ExecuteQuery method will try to return all entities matching a query. Internally it tries to fetch a maximum of 1000 entities in a single iteration and will try to fetch next set of entities if the response from table service includes a continuation token.
UPDATE
If ExecuteQuery performs pagination automatically, it seems it is
easier to use than ExecuteQuerySegmentedAsync. Why must I use
ExecuteQuerySegmentedAsync? What about download size? 1000 entities
regardless their sizes?
With ExecuteQuery, there's no way for you to break out of the loop. This becomes problematic when you have a lot of entities in a table. You have that flexibility with ExecuteQuerySegmentedAsync. For example, let's assume you want to download all entities from a very large table and save them locally. If you use ExecuteQuerySegmentedAsync, you can save the entities in different files.
Regarding your comment about 1000 entities regardless of the size, the answer is yes. Please keep in mind that maximum size of each entity can be 1MB.

Related

Azure Cosmos Db, select after row?

I'm trying to select some rows after x rows, something like:
SELECT * from collection WHERE ROWNUM >= 235 and ROWNUM <= 250
Unfortunately it looks like ROWNUM isn't resolved in azure cosmos db.
Is there another way to do this? I've looked at using continuation tokens but it's not helpful if a user skips to page 50, would I need to keep querying with continuation tokens to get to page 50?
I've tried playing around with the page size option but that has some limitations in terms of how many things it can return at any one time.
For example I have 1,000,000 records in Azure. I want to query rows
500,000 to 500,010. I can't do SELECT * from collection WHERE ROWNUM >= 500,000 and ROWNUM <= 500,010 so how do I achieve this?
If you don't have any filters, you can't retrieve items in specific range via query sql direcly in cosmos db so far. So, you need to use pagination to locate your desire items. As I know, pagination is supported based on continuation token only so far.
Please refer to the function as below:
using JayGongDocumentDB.pojo;
using Microsoft.Azure.Documents.Client;
using Microsoft.Azure.Documents.Linq;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
namespace JayGongDocumentDB.module
{
class QuerySample1
{
public static async void QueryPageByPage()
{
// Number of documents per page
const int PAGE_SIZE = 2;
int currentPageNumber = 1;
int documentNumber = 1;
// Continuation token for subsequent queries (NULL for the very first request/page)
string continuationToken = null;
do
{
Console.WriteLine($"----- PAGE {currentPageNumber} -----");
// Loads ALL documents for the current page
KeyValuePair<string, IEnumerable<Student>> currentPage = await QueryDocumentsByPage(currentPageNumber, PAGE_SIZE, continuationToken);
foreach (Student student in currentPage.Value)
{
Console.WriteLine($"[{documentNumber}] {student.Name}");
documentNumber++;
}
// Ensure the continuation token is kept for the next page query execution
continuationToken = currentPage.Key;
currentPageNumber++;
} while (continuationToken != null);
Console.WriteLine("\n--- END: Finished Querying ALL Dcuments ---");
}
public static async Task<KeyValuePair<string, IEnumerable<Student>>> QueryDocumentsByPage(int pageNumber, int pageSize, string continuationToken)
{
DocumentClient documentClient = new DocumentClient(new Uri("https://***.documents.azure.com:443/"), "***");
var feedOptions = new FeedOptions
{
MaxItemCount = pageSize,
EnableCrossPartitionQuery = true,
// IMPORTANT: Set the continuation token (NULL for the first ever request/page)
RequestContinuation = continuationToken
};
IQueryable<Student> filter = documentClient.CreateDocumentQuery<Student>("dbs/db/colls/item", feedOptions);
IDocumentQuery<Student> query = filter.AsDocumentQuery();
FeedResponse<Student> feedRespose = await query.ExecuteNextAsync<Student>();
List<Student> documents = new List<Student>();
foreach (Student t in feedRespose)
{
documents.Add(t);
}
// IMPORTANT: Ensure the continuation token is kept for the next requests
return new KeyValuePair<string, IEnumerable<Student>>(feedRespose.ResponseContinuation, documents);
}
}
}
Output:
Hope it helps you.
Update Answer:
No such function like ROW_NUMBER() [How do I use ROW_NUMBER()? ] in cosmos db so far. I also thought skip and top.However, top is supported and skip yet(feedback).It seems skip is already in processing and will be released in the future.
I think you could push the feedback related to the paging function.Or just take above continuation token as workaround temporarily.

Parallel StartsWith Queries against Azure Table Storage

I have the following method that performs a startsWith query on RowKey in a table on Azure Table Storage. I now want to run parallel queries using startsWith on RowKey.
Is it possible to create a parallel method that simply calls my existing method or will I have to create a parallel version of my existing method?
Here's my current startWith method:
public async Task<IEnumerable<T>> RowKeyStartsWith<T>
(string searchString,
string tableName,
string partitionKey,
string columnName = "RowKey") where T : ITableEntity, new()
{
// Make sure we have a search string
if (string.IsNullOrEmpty(searchString)) return null;
// Get CloudTable
var table = GetTable(tableName);
char lastChar = searchString[searchString.Length - 1];
char nextLastChar = (char)((int)lastChar + 1);
string nextSearchStr = searchString.Substring(0, searchString.Length - 1) + nextLastChar;
// Define query segment(s)
string prefixCondition = TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition(columnName, QueryComparisons.GreaterThanOrEqual, searchString),
TableOperators.And,
TableQuery.GenerateFilterCondition(columnName, QueryComparisons.LessThan, nextSearchStr)
);
string filterString = TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionKey),
TableOperators.And,
prefixCondition
);
// Create final query
var query = new TableQuery<T>().Where(filterString);
// Declare result variable
var result = new List<T>();
// Execute query asynchronously
TableContinuationToken continuationToken = null;
do
{
Task<TableQuerySegment<T>> querySegment = table.ExecuteQuerySegmentedAsync(query, continuationToken);
TableQuerySegment<T> segment = await querySegment;
result.AddRange(segment.ToList());
continuationToken = segment.ContinuationToken;
} while (continuationToken != null);
return result;
}
Is it possible to create a parallel method that simply calls my existing method or will I have to create a parallel version of my existing method?
Per my understanding, you could reuse your existing method and execute your queries with multiple tasks as follows:
//for storing the query results
ConcurrentDictionary<string, object> resultDics = new ConcurrentDictionary<string, object>();
//simulate your seaching parameters
List<RowKeyStartsWithParamModel> rowKeySearchs = Enumerable.Range(1, 10)
.Select(i => new RowKeyStartsWithParamModel()
{
SearchString = i.ToString(),
TableName = "tablename",
ColumnName = "Rowkey",
ParationKey = "partionKey"
}).ToList();
//create multiple tasks to execute your jobs
var tasks = rowKeySearchs.Select(item => Task.Run(async () =>
{
//invoke your existing RowKeyStartsWith
var results=await RowKeyStartsWith<string>(item.SearchString, item.TableName, item.ParationKey, item.ColumnName);
//add retrieved results
resultDics.TryAdd(item.SearchString, results);
}));
//synchronously wait all tasks to be executed completely.
Task.WaitAll(tasks.ToArray());
//print all retrieved results
foreach (var item in resultDics)
{
Console.WriteLine($"{item.Key},{JsonConvert.SerializeObject(item.Value)}");
}
Moreover, you could leverage Parallel as follows:
Parallel.ForEach(rowKeySearchs, async(item) =>
{
var results = await RowKeyStartsWith<string>(item.SearchString, item.TableName, item.ParationKey, item.ColumnName);
resultDics.TryAdd(item.SearchString, results);
});
Note: Since you use await in the delegate for each iteration, you could not receive the query results synchronously after Parallel.ForEach.
In order to synchronously retrieve the results by using the above code snippet, you could leverage the following approaches:
1) Synchronously retrieve the results when invoking RowKeyStartsWith under each iteration of Parallel.ForEach as follows:
var results = RowKeyStartsWith<string>(item.SearchString, item.TableName, item.ParationKey, item.ColumnName).Result;
2) You could leverage WaitHandle to synchronously wait the query results until all WaitHandles completed.
var waitHandles = rowKeySearchs.Select(d => new EventWaitHandle(false, EventResetMode.ManualReset)).ToArray();
Parallel.ForEach(rowKeySearchs, async (item,loopState,index) =>
{
var results = await RowKeyStartsWith<string>(item.SearchString, item.TableName, item.ParationKey, item.ColumnName);
resultDics.TryAdd(item.SearchString, results);
waitHandles[index].Set(); //release
});
WaitHandle.WaitAll(waitHandles); //block the current thread until all EventWaitHandles released

azure table storage pagination for request 10 items each time

Basically I am trying to get pagination working when requesting entities of azure table storage. i.e. Press next button gets the next 10 entities & Press previous button gets the previous 10 entities. A relatively close example Gaurav Mantri's Answer. But my question is how do I get the nextPartitionKey and nextRowKey from a HTML button attribute and store in to a array/list in order to keep track of current page so I can get the next/previous items?Code example would be very appreciated.
Thanks!
This is something I have right now which gets a range of data based on pageNumber request
private async Task<List<UserInfo>> queryPage(CloudTable peopleTable, string item, int pageNumber)
{
// Construct the query operation for all customer entities
TableQuery<CustomerEntity> query = new TableQuery<CustomerEntity>().Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, item));
// Print the fields for each customer.
TableContinuationToken token = null;
//TodoItem data = new TodoItem();
List<UserInfo> data = new List<UserInfo>();
do
{
TableQuerySegment<CustomerEntity> resultSegment = await peopleTable.ExecuteQuerySegmentedAsync(query, token);
token = resultSegment.ContinuationToken;
foreach (CustomerEntity entity in resultSegment.Results)
{
data.Add(new UserInfo
{
// add data
});
}
} while (token != null);
//get a subset of all entity
List<UserInfo> sublist = data.GetRange(0, pageNumber);
return sublist;
}
Managed to solved the problem under Gaurav's help.
Here is the code, not perfect but works.
private async Task<List<UserInfo>> queryPage(CloudTable peopleTable, string item, string NextPartitionKey , string NextRowKey, int itemNumber)
{
// Construct the query operation for all customer entities
TableQuery<CustomerEntity> query = new TableQuery<CustomerEntity>().Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, item)).Take(itemNumber);
// Print the fields for each customer.
List<UserInfo> data = new List<UserInfo>();
Tabletoken.NextPartitionKey = NextPartitionKey;
Tabletoken.NextRowKey = NextRowKey;
TableQuerySegment<CustomerEntity> resultSegment = await peopleTable.ExecuteQuerySegmentedAsync(query, Tabletoken);
Tabletoken = resultSegment.ContinuationToken;
foreach (CustomerEntity entity in resultSegment.Results)
{
data.Add(new UserInfo
{
//add data
});
}
return data;
}
private TableContinuationToken Tabletoken = new TableContinuationToken();
and declare it use a tuple.
Tuple<List<UserInfo>, string, string > tuple =
new Tuple<List<UserInfo>, string, string>(data, Tabletoken.NextPartitionKey, Tabletoken.NextRowKey);

QueryExpression with no results in Dynamics CRM plugin

I wrote the following function to get the SharePointDocumentLocation records regarding an account or contact. However, even though I provide an id which most definitely has got a SPDL record associated the result of a count on the EntityCollection that is returned is alway 0. Why does my query not return SPDL records?
internal static EntityCollection GetSPDocumentLocation(IOrganizationService service, Guid id)
{
SharePointDocumentLocation spd = new SharePointDocumentLocation();
QueryExpression query = new QueryExpression
{
EntityName = "sharepointdocumentlocation",
ColumnSet = new ColumnSet("sharepointdocumentlocationid"),
Criteria = new FilterExpression
{
Conditions =
{
new ConditionExpression
{
AttributeName = "regardingobjectid",
Operator = ConditionOperator.Equal,
Values = { id }
}
}
}
};
return service.RetrieveMultiple(query);
}
The following code does work
using System;
using Microsoft.Xrm.Sdk;
using Microsoft.Xrm.Sdk.Client;
using System.ServiceModel.Description;
using System.Net;
using Microsoft.Xrm.Sdk.Query;
namespace CRMConsoleTests
{
class Program
{
static void Main(string[] args)
{
ClientCredentials credentials = new ClientCredentials();
credentials.Windows.ClientCredential = CredentialCache.DefaultNetworkCredentials;
Uri orgUri = new Uri("http://localhost/CRMDEV2/XRMServices/2011/Organization.svc");
Uri homeRealmUri = null;
using (OrganizationServiceProxy service = new OrganizationServiceProxy(orgUri, homeRealmUri, credentials, null))
{
//ConditionExpression ce = new ConditionExpression("regardingobjectid", ConditionOperator.Equal, new Guid(""));
QueryExpression qe = new QueryExpression("sharepointdocumentlocation");
qe.ColumnSet = new ColumnSet(new String[] { "sharepointdocumentlocationid", "regardingobjectid" });
//qe.Criteria.AddCondition(ce);
EntityCollection result = service.RetrieveMultiple(qe);
foreach (Entity entity in result.Entities)
{
Console.WriteLine("Results for the first record: ");
SharePointDocumentLocation spd = entity.ToEntity<SharePointDocumentLocation>();
if (spd.RegardingObjectId != null)
{
Console.WriteLine("Id: " + spd.SharePointDocumentLocationId.ToString() + " with RoId: " + spd.RegardingObjectId.Id.ToString());
}
}
Console.ReadLine();
}
}
}
}
It retrieves 4 records, and when I debug the plugincode above it retrieves 3 records.
Everything looks good with your QueryExpression, although I'd write it a little more concise (something like this):
var qe = new QueryExpression(SharePointDocumentLocation.EntityLogicalName){
ColmnSet = new ColumnSet("sharepointdocumentlocationid"),
};
qe.Criteria.AddCondition("regardingobjectid", ConditionOperator.Equal, id);
Because I don't see anything wrong with the QueryExpression that leads me with two guesses.
You're using impersonation on the IOrganizationService and the impersonated user doesn't have rights to the SharePointDocumentLocation. You won't get an error, you just won't get any records returned.
The id you're passing in is incorrect.
I'd remove the Criteria and see how many records you get back. If you don't get all of the records back, you know your issue is with guess #1.
If you get all records, add the regardingobjectid to the ColumnSet and retrieve the first record without any Criteria in the QueryExpression, then call this method passing in the id of the regardingobject you returned. If nothing is received when adding the regardingobjectid constraint, then something else is wrong.
Update
Since this is executing within the delete of the plugin, it must be performing its cascade deletes before your plugin is firing. You can try the Pre-Validation.
Now that I think of it, it must perform the deletion of the cascading entities in the Validation stage, because if one of them is unable to be deleted, the entity itself can't be deleted.

tableclient.RetryPolicy Vs. TransientFaultHandling

Both myself and a colleague have been tasked with finding connection-retry logic for Azure Table Storage. After some searching, I found this really cool Enterprise Library suite, which contains the Microsoft.Practices.TransientFaultHandling namespace.
Following a few code examples, I ended up creating an Incremental retry strategy, and wrapping one of our storage calls with the retryPolicy's ExecuteAction callback handler :
/// <inheritdoc />
public void SaveSetting(int userId, string bookId, string settingId, string itemId, JObject value)
{
// Define your retry strategy: retry 5 times, starting 1 second apart, adding 2 seconds to the interval each retry.
var retryStrategy = new Incremental(5, TimeSpan.FromSeconds(1), TimeSpan.FromSeconds(2));
var storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting(StorageConnectionStringName));
try
{
retryPolicy.ExecuteAction(() =>
{
var tableClient = storageAccount.CreateCloudTableClient();
var table = tableClient.GetTableReference(SettingsTableName);
table.CreateIfNotExists();
var entity = new Models.Azure.Setting
{
PartitionKey = GetPartitionKey(userId, bookId),
RowKey = GetRowKey(settingId, itemId),
UserId = userId,
BookId = bookId.ToLowerInvariant(),
SettingId = settingId.ToLowerInvariant(),
ItemId = itemId.ToLowerInvariant(),
Value = value.ToString(Formatting.None)
};
table.Execute(TableOperation.InsertOrReplace(entity));
});
}
catch (StorageException exception)
{
ExceptionHelpers.CheckForPropertyValueTooLargeMessage(exception);
throw;
}
}
}
Feeling awesome, I went to go show my colleague, and he smugly noted that we could do the same thing without having to include Enterprise Library, as the CloudTableClient object already has a setter for a retry policy. His code ended up looking like :
/// <inheritdoc />
public void SaveSetting(int userId, string bookId, string settingId, string itemId, JObject value)
{
var storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting(StorageConnectionStringName));
var tableClient = storageAccount.CreateCloudTableClient();
// set retry for the connection
tableClient.RetryPolicy = new ExponentialRetry(TimeSpan.FromSeconds(2), 3);
var table = tableClient.GetTableReference(SettingsTableName);
table.CreateIfNotExists();
var entity = new Models.Azure.Setting
{
PartitionKey = GetPartitionKey(userId, bookId),
RowKey = GetRowKey(settingId, itemId),
UserId = userId,
BookId = bookId.ToLowerInvariant(),
SettingId = settingId.ToLowerInvariant(),
ItemId = itemId.ToLowerInvariant(),
Value = value.ToString(Formatting.None)
};
try
{
table.Execute(TableOperation.InsertOrReplace(entity));
}
catch (StorageException exception)
{
ExceptionHelpers.CheckForPropertyValueTooLargeMessage(exception);
throw;
}
}
My Question :
Is there any major difference between these two approaches, aside from their implementations? They both seem to accomplish the same goal, but are there cases where it's better to use one over the other?
Functionally speaking both are the same - they both retries requests in case of transient errors. However there are few differences:
Retry policy handling in storage client library only handles retries for storage operations while transient fault handling retries not only handles storage operations but also retries SQL Azure, Service Bus and Cache operations in case of transient errors. So if you have a project where you're using more that storage but would like to have just one approach for handling transient errors, you may want to use transient fault handling application block.
One thing I liked about transient fault handling block is that you can intercept retry operations which you can't do with retry policy. For example, look at the code below:
var retryManager = EnterpriseLibraryContainer.Current.GetInstance<RetryManager>();
var retryPolicy = retryManager.GetRetryPolicy<StorageTransientErrorDetectionStrategy>(ConfigurationHelper.ReadFromServiceConfigFile(Constants.DefaultRetryStrategyForTableStorageOperationsKey));
retryPolicy.Retrying += (sender, args) =>
{
// Log details of the retry.
var message = string.Format(CultureInfo.InvariantCulture, TableOperationRetryTraceFormat, "TableStorageHelper::CreateTableIfNotExist", storageAccount.Credentials.AccountName,
tableName, args.CurrentRetryCount, args.Delay);
TraceHelper.TraceError(message, args.LastException);
};
try
{
var isTableCreated = retryPolicy.ExecuteAction(() =>
{
var table = storageAccount.CreateCloudTableClient().GetTableReference(tableName);
return table.CreateIfNotExists(requestOptions, operationContext);
});
return isTableCreated;
}
catch (Exception)
{
throw;
}
In the code example above, I could intercept retry operations and do something there if I want to. This is not possible with storage client library.
Having said all of this, it is generally recommended to go with storage client library retry policy for retrying storage operations as it is an integral part of the package and thus would be kept up to date with the latest changes to the library.

Resources