Parallel StartsWith Queries against Azure Table Storage - azure

I have the following method that performs a startsWith query on RowKey in a table on Azure Table Storage. I now want to run parallel queries using startsWith on RowKey.
Is it possible to create a parallel method that simply calls my existing method or will I have to create a parallel version of my existing method?
Here's my current startWith method:
public async Task<IEnumerable<T>> RowKeyStartsWith<T>
(string searchString,
string tableName,
string partitionKey,
string columnName = "RowKey") where T : ITableEntity, new()
{
// Make sure we have a search string
if (string.IsNullOrEmpty(searchString)) return null;
// Get CloudTable
var table = GetTable(tableName);
char lastChar = searchString[searchString.Length - 1];
char nextLastChar = (char)((int)lastChar + 1);
string nextSearchStr = searchString.Substring(0, searchString.Length - 1) + nextLastChar;
// Define query segment(s)
string prefixCondition = TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition(columnName, QueryComparisons.GreaterThanOrEqual, searchString),
TableOperators.And,
TableQuery.GenerateFilterCondition(columnName, QueryComparisons.LessThan, nextSearchStr)
);
string filterString = TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionKey),
TableOperators.And,
prefixCondition
);
// Create final query
var query = new TableQuery<T>().Where(filterString);
// Declare result variable
var result = new List<T>();
// Execute query asynchronously
TableContinuationToken continuationToken = null;
do
{
Task<TableQuerySegment<T>> querySegment = table.ExecuteQuerySegmentedAsync(query, continuationToken);
TableQuerySegment<T> segment = await querySegment;
result.AddRange(segment.ToList());
continuationToken = segment.ContinuationToken;
} while (continuationToken != null);
return result;
}

Is it possible to create a parallel method that simply calls my existing method or will I have to create a parallel version of my existing method?
Per my understanding, you could reuse your existing method and execute your queries with multiple tasks as follows:
//for storing the query results
ConcurrentDictionary<string, object> resultDics = new ConcurrentDictionary<string, object>();
//simulate your seaching parameters
List<RowKeyStartsWithParamModel> rowKeySearchs = Enumerable.Range(1, 10)
.Select(i => new RowKeyStartsWithParamModel()
{
SearchString = i.ToString(),
TableName = "tablename",
ColumnName = "Rowkey",
ParationKey = "partionKey"
}).ToList();
//create multiple tasks to execute your jobs
var tasks = rowKeySearchs.Select(item => Task.Run(async () =>
{
//invoke your existing RowKeyStartsWith
var results=await RowKeyStartsWith<string>(item.SearchString, item.TableName, item.ParationKey, item.ColumnName);
//add retrieved results
resultDics.TryAdd(item.SearchString, results);
}));
//synchronously wait all tasks to be executed completely.
Task.WaitAll(tasks.ToArray());
//print all retrieved results
foreach (var item in resultDics)
{
Console.WriteLine($"{item.Key},{JsonConvert.SerializeObject(item.Value)}");
}
Moreover, you could leverage Parallel as follows:
Parallel.ForEach(rowKeySearchs, async(item) =>
{
var results = await RowKeyStartsWith<string>(item.SearchString, item.TableName, item.ParationKey, item.ColumnName);
resultDics.TryAdd(item.SearchString, results);
});
Note: Since you use await in the delegate for each iteration, you could not receive the query results synchronously after Parallel.ForEach.
In order to synchronously retrieve the results by using the above code snippet, you could leverage the following approaches:
1) Synchronously retrieve the results when invoking RowKeyStartsWith under each iteration of Parallel.ForEach as follows:
var results = RowKeyStartsWith<string>(item.SearchString, item.TableName, item.ParationKey, item.ColumnName).Result;
2) You could leverage WaitHandle to synchronously wait the query results until all WaitHandles completed.
var waitHandles = rowKeySearchs.Select(d => new EventWaitHandle(false, EventResetMode.ManualReset)).ToArray();
Parallel.ForEach(rowKeySearchs, async (item,loopState,index) =>
{
var results = await RowKeyStartsWith<string>(item.SearchString, item.TableName, item.ParationKey, item.ColumnName);
resultDics.TryAdd(item.SearchString, results);
waitHandles[index].Set(); //release
});
WaitHandle.WaitAll(waitHandles); //block the current thread until all EventWaitHandles released

Related

Program terminates when calling WorkItemTrackingHttpClient.QueryByWiqlAsyc()

I am working on a program that gets a list of workitems in the committed state from Azure DevOps for a specific area path and iteration path. My code is based on an example found at the following link: https://learn.microsoft.com/en-us/azure/devops/integrate/quickstarts/work-item-quickstart?view=azure-devops
The issue I am running into is when QueryByWiqlAsync() is called, the program terminates and there are no errors for why it terminated. Below is the code in question. I tried calling QueryByWiqlAsync() with and without the ConfigureAwait(false) and that did not seem to make a difference. Any suggestions on what to try or what to fix are appreciated.
static async void GetWorkItemsToTaskFromADO(string tfs_project, string accessToken)
{
var credentials = new VssBasicCredential(string.Empty, accessToken);
var wiql = new Wiql()
{
Query = #"Select [Id] From WorkItems WHERE [System.TeamProject] = 'SampleADOProject' AND [System.AreaPath] = 'Sample\ADO\AreaPath' AND [System.IterationPath] = 'Sample\ADO\IterationPath' AND [System.State] = 'Committed'"
};
using (var httpClient = new WorkItemTrackingHttpClient(new Uri(tfs_project), credentials))
{
try
{
var result = await httpClient.QueryByWiqlAsync(wiql).ConfigureAwait(false);
var ids = result.WorkItems.Select(item => item.Id).ToArray();
var fields = new[] { "System.Id", "System.Title", "System.State" };
var workItems = await httpClient.GetWorkItemsAsync(ids, fields, result.AsOf).ConfigureAwait(false);
// output results to test what came back...
foreach (var workItem in workItems)
{
Console.WriteLine(
"{0}\t{1}\t{2}",
workItem.Id,
workItem.Fields["System.Title"],
workItem.Fields["System.State"]
);
}
}
catch(Exception ex)
{
Console.WriteLine(ex.Message);
Console.Read();
}
}
}

How to add pagination and limit to cosmos db

I want to specify the number of records to return from the database. The only way I've come up with that allows me to do that, is by setting the MaxItemCount = 1, so that each feed response contains only one result, and then readNext from the iterator the wanted number of times. I don't know enough about RUs and what goes on behind the scenes with cosmos db, but I suspect setting MaxItemCount = 1 is bad practice. So is there any other way to do this?
public async Task<(IEnumerable<T> Results, string ContinuationToken)> ReadAsync(Expression<Func<T, bool>> predicate, int limit, string continuationToken)
{
var items = new List<T>();
var options = new QueryRequestOptions { MaxItemCount = 1 };
IQueryable<T> query = _container.GetItemLinqQueryable<T>(true, continuationToken, options).Where(predicate);
using (var iterator = query.ToFeedIterator())
{
while (iterator.HasMoreResults && (limit == -1 || entityList.Count < limit))
{
FeedResponse<T> response = await iterator.ReadNextAsync();
items.AddRange(response.Resource);
continuationToken = response.ContinuationToken;
}
}
return (items, continuationToken);
}
Regarding pagination, your approach is correct. What you could do is maintain an internal records counter and adjust it based on the number of documents received.
Here's the pseudo code to do so:
public async Task<(IEnumerable<T> Results, string ContinuationToken)> ReadAsync(Expression<Func<T, bool>> predicate, int limit, string continuationToken)
{
var items = new List<T>();
int itemsRemaining = limit//Let's say you want to fetch a finite number of items.
do
{
maxItemsCount = Math.Min(itemsRemaining, 100);//assuming you want to fetch a minimum of 100 items at a time
var options = new QueryRequestOptions { MaxItemCount = maxItemsCount };
IQueryable<T> query = _container.GetItemLinqQueryable<T>(true, continuationToken, options).Where(predicate);
using (var iterator = query.ToFeedIterator())
{
while (iterator.HasMoreResults)
{
FeedResponse<T> response = await iterator.ReadNextAsync();
items.AddRange(response.Resource);
continuationToken = response.ContinuationToken;
}
}
itemsRemaining -= items.Count;
}
while (itemsRemaining > 0);
return (items, continuationToken);
}
In the case where you wish to fetch 101 records, your code will iterate at least twice. First time it will fetch 100 records and the next time it will only fetch 1 record (101 - 100).

azure table storage pagination for request 10 items each time

Basically I am trying to get pagination working when requesting entities of azure table storage. i.e. Press next button gets the next 10 entities & Press previous button gets the previous 10 entities. A relatively close example Gaurav Mantri's Answer. But my question is how do I get the nextPartitionKey and nextRowKey from a HTML button attribute and store in to a array/list in order to keep track of current page so I can get the next/previous items?Code example would be very appreciated.
Thanks!
This is something I have right now which gets a range of data based on pageNumber request
private async Task<List<UserInfo>> queryPage(CloudTable peopleTable, string item, int pageNumber)
{
// Construct the query operation for all customer entities
TableQuery<CustomerEntity> query = new TableQuery<CustomerEntity>().Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, item));
// Print the fields for each customer.
TableContinuationToken token = null;
//TodoItem data = new TodoItem();
List<UserInfo> data = new List<UserInfo>();
do
{
TableQuerySegment<CustomerEntity> resultSegment = await peopleTable.ExecuteQuerySegmentedAsync(query, token);
token = resultSegment.ContinuationToken;
foreach (CustomerEntity entity in resultSegment.Results)
{
data.Add(new UserInfo
{
// add data
});
}
} while (token != null);
//get a subset of all entity
List<UserInfo> sublist = data.GetRange(0, pageNumber);
return sublist;
}
Managed to solved the problem under Gaurav's help.
Here is the code, not perfect but works.
private async Task<List<UserInfo>> queryPage(CloudTable peopleTable, string item, string NextPartitionKey , string NextRowKey, int itemNumber)
{
// Construct the query operation for all customer entities
TableQuery<CustomerEntity> query = new TableQuery<CustomerEntity>().Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, item)).Take(itemNumber);
// Print the fields for each customer.
List<UserInfo> data = new List<UserInfo>();
Tabletoken.NextPartitionKey = NextPartitionKey;
Tabletoken.NextRowKey = NextRowKey;
TableQuerySegment<CustomerEntity> resultSegment = await peopleTable.ExecuteQuerySegmentedAsync(query, Tabletoken);
Tabletoken = resultSegment.ContinuationToken;
foreach (CustomerEntity entity in resultSegment.Results)
{
data.Add(new UserInfo
{
//add data
});
}
return data;
}
private TableContinuationToken Tabletoken = new TableContinuationToken();
and declare it use a tuple.
Tuple<List<UserInfo>, string, string > tuple =
new Tuple<List<UserInfo>, string, string>(data, Tabletoken.NextPartitionKey, Tabletoken.NextRowKey);

Does using QueueClient.OnMessage inside an asynchronous method make sense?

I am calling an async method InsertOperation from an async method ConfigureConnectionString. Am I using the client.OnMessage call correctly? I want to process the messages in a queue asynchronously and then store them to the queue storage.
private static async void ConfigureConnectionString()
{
var connectionString =
"myconnstring";
var queueName = "myqueue";
CloudStorageAccount storageAccount = CloudStorageAccount.Parse(CloudConfigurationManager.GetSetting("StorageConnectionString"));
CloudTableClient tableClient = storageAccount.CreateCloudTableClient();
CloudTable table = tableClient.GetTableReference("test");
table.CreateIfNotExists();
Stopwatch sw = Stopwatch.StartNew();
await Task.Run(() => InsertOperation(connectionString, queueName, table));
sw.Stop();
Console.WriteLine("ElapsedTime " + sw.Elapsed.TotalMinutes + " minutes.");
}
private static async Task InsertOperation(string connectionString, string queueName, CloudTable table)
{
var client = QueueClient.CreateFromConnectionString(connectionString, queueName);
client.OnMessage(message =>
{
var bodyJson = new StreamReader(message.GetBody<Stream>(), Encoding.UTF8).ReadToEnd();
var myMessage = JsonConvert.DeserializeObject<VerifyVariable>(bodyJson);
Console.WriteLine();
var VerifyVariableEntityObject = new VerifyVariableEntity()
{
ConsumerId = myMessage.ConsumerId,
Score = myMessage.Score,
PartitionKey = myMessage.ConsumerId,
RowKey = myMessage.Score
};
});
}
OnMessageAsync method provides async programming model, it enables us to process a message asynchronously.
client.OnMessageAsync(message =>
{
return Task.Factory.StartNew(() => ProcessMessage(message));
//you could perofrm table and queue storage in ProcessMessage method
}, options);
Without understanding the actual logic you want to achieve, it looks like you are not using OnMessage correctly.
OnMessage is a way to set up the queue client behavior for a long running client. It makes sense, for example, if you have a singleton instance in your application. In that case, you are specifing to the client how you want to handle any messages that are put in the queue.
In your example, however, you create the client, set up the OnMessage, and don't persist the client, so it effectively doesn't get anything accomplished.

How to Insert/Update into Azure Table using Windows Azure SDK 2.0

I have multiple entities to be stored in the same physical Azure table. I'm trying to Insert/Merge the table entries from a file. I'm trying to find a way to do this w/o really serializing each property or for that matter creating a custom entities.
While trying the following code, I thought maybe I could use generic DynamicTableEntity. However, I'm not sure if it helps in an insert operation (most documentation is for replace/merge operations).
The error I get is
HResult=-2146233088
Message=Unexpected response code for operation : 0
Source=Microsoft.WindowsAzure.Storage
Any help is appreciated.
Here's an excerpt of my code
_tableClient = storageAccount.CreateCloudTableClient();
_table = _tableClient.GetTableReference("CloudlyPilot");
_table.CreateIfNotExists();
TableBatchOperation batch = new TableBatchOperation();
....
foreach (var pkGroup in result.Elements("PartitionGroup"))
{
foreach (var entity in pkGroup.Elements())
{
DynamicTableEntity tableEntity = new DynamicTableEntity();
string partitionKey = entity.Elements("PartitionKey").FirstOrDefault().Value;
string rowKey = entity.Elements("RowKey").FirstOrDefault().Value;
Dictionary<string, EntityProperty> props = new Dictionary<string, EntityProperty>();
//if (pkGroup.Attribute("name").Value == "CloudServices Page")
//{
// tableEntity = new CloudServicesGroupEntity (partitionKey, rowKey);
//}
//else
//{
// tableEntity = new CloudServiceDetailsEntity(partitionKey,rowKey);
//}
foreach (var element in entity.Elements())
{
tableEntity.Properties[element.Name.ToString()] = new EntityProperty(element.Value.ToString());
}
tableEntity.ETag = Guid.NewGuid().ToString();
tableEntity.Timestamp = new DateTimeOffset(DateTime.Now.ToUniversalTime());
//tableEntity.WriteEntity(/*WHERE TO GET AN OPERATION CONTEXT FROM?*/)
batch.InsertOrMerge(tableEntity);
}
_table.ExecuteBatch(batch);
batch.Clear();
}
Have you tried using DictionaryTableEntity? This class allows you to dynamically fill the entity as if it were a dictionary (similar to DynamicTableEntity). I tried something like your code and it works:
var batch = new TableBatchOperation();
var entity1 = new DictionaryTableEntity();
entity1.PartitionKey = "abc";
entity1.RowKey = Guid.NewGuid().ToString();
entity1.Add("name", "Steve");
batch.InsertOrMerge(entity1);
var entity2 = new DictionaryTableEntity();
entity2.PartitionKey = "abc";
entity2.RowKey = Guid.NewGuid().ToString();
entity2.Add("name", "Scott");
batch.InsertOrMerge(entity2);
table.ExecuteBatch(batch);
var entities = table.ExecuteQuery<DictionaryTableEntity>(new TableQuery<DictionaryTableEntity>());
One last thing, I see that you're setting the Timestamp and ETag yourself. Remove these two lines and try again.

Resources