How to add pagination and limit to cosmos db - pagination

I want to specify the number of records to return from the database. The only way I've come up with that allows me to do that, is by setting the MaxItemCount = 1, so that each feed response contains only one result, and then readNext from the iterator the wanted number of times. I don't know enough about RUs and what goes on behind the scenes with cosmos db, but I suspect setting MaxItemCount = 1 is bad practice. So is there any other way to do this?
public async Task<(IEnumerable<T> Results, string ContinuationToken)> ReadAsync(Expression<Func<T, bool>> predicate, int limit, string continuationToken)
{
var items = new List<T>();
var options = new QueryRequestOptions { MaxItemCount = 1 };
IQueryable<T> query = _container.GetItemLinqQueryable<T>(true, continuationToken, options).Where(predicate);
using (var iterator = query.ToFeedIterator())
{
while (iterator.HasMoreResults && (limit == -1 || entityList.Count < limit))
{
FeedResponse<T> response = await iterator.ReadNextAsync();
items.AddRange(response.Resource);
continuationToken = response.ContinuationToken;
}
}
return (items, continuationToken);
}

Regarding pagination, your approach is correct. What you could do is maintain an internal records counter and adjust it based on the number of documents received.
Here's the pseudo code to do so:
public async Task<(IEnumerable<T> Results, string ContinuationToken)> ReadAsync(Expression<Func<T, bool>> predicate, int limit, string continuationToken)
{
var items = new List<T>();
int itemsRemaining = limit//Let's say you want to fetch a finite number of items.
do
{
maxItemsCount = Math.Min(itemsRemaining, 100);//assuming you want to fetch a minimum of 100 items at a time
var options = new QueryRequestOptions { MaxItemCount = maxItemsCount };
IQueryable<T> query = _container.GetItemLinqQueryable<T>(true, continuationToken, options).Where(predicate);
using (var iterator = query.ToFeedIterator())
{
while (iterator.HasMoreResults)
{
FeedResponse<T> response = await iterator.ReadNextAsync();
items.AddRange(response.Resource);
continuationToken = response.ContinuationToken;
}
}
itemsRemaining -= items.Count;
}
while (itemsRemaining > 0);
return (items, continuationToken);
}
In the case where you wish to fetch 101 records, your code will iterate at least twice. First time it will fetch 100 records and the next time it will only fetch 1 record (101 - 100).

Related

CosmosDB CreateDocumentQuery with EnableCrossPartitionQuery does not return results but does if specifying the PartitionKey

I have a problem where, if I specify the Partition Key for a specific query, I will get the record that I expect. However, if I do not specify a Partition Key and just set EnableCrossPartitionQuery to true, it will not return/find any documents.
This actually works as expected on one of my Cosmos DBs but not on another one. Same records/documents.
I'm using the following setup
Microsoft.Azure.DocumentDB NuGet package version 2.3.0
Cosmos DB Collection with Unlimited capacity (with PartitionKey =
ApplicationId)
Three very simple documents on the DB/Collection, created manually through Azure Storage Explorer
My code looks like below, fairly simple. If the caller of GetDocuments passes a value for partitionKey, then I specify it on the query through FeedOptions. Otherwise, I set EnableCrossPartitionQuery on the FeedOptions.
The documents on the Database/Collection that does work are the same as the ones on the DB/Collection that does not work.
I created the Collections in the same way, with the same Partition Key (ApplicationId)
public async Task<IEnumerable<T>> GetDocuments<T>(Expression<Func<T, bool>> predicate, object partitionKey = null)
{
IDocumentQuery<T> queryDetails = QueryDocument<T>(predicate, partitionKey);
var queryData = await queryDetails.ExecuteNextAsync<T>();
if (queryData.Any())
{
return queryData;
}
return default(IEnumerable<T>);
}
private IDocumentQuery<T> QueryDocument<T>(Expression<Func<T, bool>> predicate, object partitionKey = null)
{
FeedOptions feedOptions;
if (partitionKey == null)
{
feedOptions = new FeedOptions { EnableCrossPartitionQuery = true };
}
else
{
feedOptions = new FeedOptions { PartitionKey = new PartitionKey(partitionKey) };
}
var query = _client.CreateDocumentQuery<T>(DocumentCollectionUri, feedOptions);
var queryDetails = query.Where(predicate).AsDocumentQuery();
return queryDetails;
}
The document looks like this:
{
"id": "1",
"HubName": "abxyz-hub",
"ClientId": "abxyz",
"ApplicationId": 1,
"ApplicationName": "My App Name",
"_rid": "hSkpAJde99IBAAAAAAAAAA==",
"_self": "dbs/hSkpAA==/colls/hSkpAJde99I=/docs/hSkpAJde99IBAAAAAAAAAA==/",
"_etag": "\"53007677-0000-0100-0000-5cbb3c660000\"",
"_attachments": "attachments/",
"_ts": 1555774566
}
Any ideas why this does not work?
Your code is wrong. Your mistake is in this check:
if (queryData.Any())
You are returning before you actually get all the data back.
The reason why your code works with a partition key is because you are targeting one physical partition (via a logical partition) and the data contained is less than the MaxItemCount or the RequestOptions object you provide.
Cosmos DB only returns paginated results, and it needs to call every physical partition for those values, some times multiples times each partition and in some cases an iteration might return 0 data but the next one might have some. You must ExecuteNextAsync until HasMoreResults is false.
Adding a while loop to get all the paged results from all the physical partitions that your cross partition query will hit will solve the problem:
Here is the code:
public async Task<IEnumerable<T>> GetDocuments<T>(Expression<Func<T, bool>> predicate, object partitionKey = null)
{
IDocumentQuery<T> documentQuery = QueryDocument<T>(predicate, partitionKey);
var results = new List<T>();
while(documentQuery.HasMoreResults)
{
var docs = await documentQuery.ExecuteNextAsync<T>();
results.AddRange(docs)
}
return results;
}
private IDocumentQuery<T> QueryDocument<T>(Expression<Func<T, bool>> predicate, object partitionKey = null)
{
FeedOptions feedOptions;
if (partitionKey == null)
{
feedOptions = new FeedOptions { EnableCrossPartitionQuery = true };
}
else
{
feedOptions = new FeedOptions { PartitionKey = new PartitionKey(partitionKey) };
}
var query = _client.CreateDocumentQuery<T>(DocumentCollectionUri, feedOptions);
var queryDetails = query.Where(predicate).AsDocumentQuery();
return queryDetails;
}

Azure Blob : how to separate images and videos

I am using ListBlobsSegmentedAsync in my C# code to list all the blobs.Is there a way i can separate the images and videos from response of ListBlobsSegmentedAsync ?
Here is an example from this link. You should be able to optimise the code to do a yield return which will return results iteratively and not leave your calling code waiting for all the results to be returned.
public static String WildCardToRegular(String value)
{
return "^" + Regex.Escape(value).Replace("\\*", ".*") + "$";
}
Then, using it with ListBlobsSegmentedAsync:
var blobList = await container.ListBlobsSegmentedAsync(blobFilePath, true, BlobListingDetails.None, 1000, token, null, null);
var items = blobList.Results.Select(x => x as CloudBlockBlob);
// Filter items by search pattern, if specify
if (!string.IsNullOrEmpty(searchPattern))
{
items = items.Select(i =>
{
var filename = Path.GetFileName(i.Name);
if (Regex.IsMatch(filename, WildCardToRegular(searchPattern), RegexOptions.IgnoreCase))
{
return i;
}
return null;
}).ToList();
}

Azure Cosmos Db, select after row?

I'm trying to select some rows after x rows, something like:
SELECT * from collection WHERE ROWNUM >= 235 and ROWNUM <= 250
Unfortunately it looks like ROWNUM isn't resolved in azure cosmos db.
Is there another way to do this? I've looked at using continuation tokens but it's not helpful if a user skips to page 50, would I need to keep querying with continuation tokens to get to page 50?
I've tried playing around with the page size option but that has some limitations in terms of how many things it can return at any one time.
For example I have 1,000,000 records in Azure. I want to query rows
500,000 to 500,010. I can't do SELECT * from collection WHERE ROWNUM >= 500,000 and ROWNUM <= 500,010 so how do I achieve this?
If you don't have any filters, you can't retrieve items in specific range via query sql direcly in cosmos db so far. So, you need to use pagination to locate your desire items. As I know, pagination is supported based on continuation token only so far.
Please refer to the function as below:
using JayGongDocumentDB.pojo;
using Microsoft.Azure.Documents.Client;
using Microsoft.Azure.Documents.Linq;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
namespace JayGongDocumentDB.module
{
class QuerySample1
{
public static async void QueryPageByPage()
{
// Number of documents per page
const int PAGE_SIZE = 2;
int currentPageNumber = 1;
int documentNumber = 1;
// Continuation token for subsequent queries (NULL for the very first request/page)
string continuationToken = null;
do
{
Console.WriteLine($"----- PAGE {currentPageNumber} -----");
// Loads ALL documents for the current page
KeyValuePair<string, IEnumerable<Student>> currentPage = await QueryDocumentsByPage(currentPageNumber, PAGE_SIZE, continuationToken);
foreach (Student student in currentPage.Value)
{
Console.WriteLine($"[{documentNumber}] {student.Name}");
documentNumber++;
}
// Ensure the continuation token is kept for the next page query execution
continuationToken = currentPage.Key;
currentPageNumber++;
} while (continuationToken != null);
Console.WriteLine("\n--- END: Finished Querying ALL Dcuments ---");
}
public static async Task<KeyValuePair<string, IEnumerable<Student>>> QueryDocumentsByPage(int pageNumber, int pageSize, string continuationToken)
{
DocumentClient documentClient = new DocumentClient(new Uri("https://***.documents.azure.com:443/"), "***");
var feedOptions = new FeedOptions
{
MaxItemCount = pageSize,
EnableCrossPartitionQuery = true,
// IMPORTANT: Set the continuation token (NULL for the first ever request/page)
RequestContinuation = continuationToken
};
IQueryable<Student> filter = documentClient.CreateDocumentQuery<Student>("dbs/db/colls/item", feedOptions);
IDocumentQuery<Student> query = filter.AsDocumentQuery();
FeedResponse<Student> feedRespose = await query.ExecuteNextAsync<Student>();
List<Student> documents = new List<Student>();
foreach (Student t in feedRespose)
{
documents.Add(t);
}
// IMPORTANT: Ensure the continuation token is kept for the next requests
return new KeyValuePair<string, IEnumerable<Student>>(feedRespose.ResponseContinuation, documents);
}
}
}
Output:
Hope it helps you.
Update Answer:
No such function like ROW_NUMBER() [How do I use ROW_NUMBER()? ] in cosmos db so far. I also thought skip and top.However, top is supported and skip yet(feedback).It seems skip is already in processing and will be released in the future.
I think you could push the feedback related to the paging function.Or just take above continuation token as workaround temporarily.

Parallel StartsWith Queries against Azure Table Storage

I have the following method that performs a startsWith query on RowKey in a table on Azure Table Storage. I now want to run parallel queries using startsWith on RowKey.
Is it possible to create a parallel method that simply calls my existing method or will I have to create a parallel version of my existing method?
Here's my current startWith method:
public async Task<IEnumerable<T>> RowKeyStartsWith<T>
(string searchString,
string tableName,
string partitionKey,
string columnName = "RowKey") where T : ITableEntity, new()
{
// Make sure we have a search string
if (string.IsNullOrEmpty(searchString)) return null;
// Get CloudTable
var table = GetTable(tableName);
char lastChar = searchString[searchString.Length - 1];
char nextLastChar = (char)((int)lastChar + 1);
string nextSearchStr = searchString.Substring(0, searchString.Length - 1) + nextLastChar;
// Define query segment(s)
string prefixCondition = TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition(columnName, QueryComparisons.GreaterThanOrEqual, searchString),
TableOperators.And,
TableQuery.GenerateFilterCondition(columnName, QueryComparisons.LessThan, nextSearchStr)
);
string filterString = TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionKey),
TableOperators.And,
prefixCondition
);
// Create final query
var query = new TableQuery<T>().Where(filterString);
// Declare result variable
var result = new List<T>();
// Execute query asynchronously
TableContinuationToken continuationToken = null;
do
{
Task<TableQuerySegment<T>> querySegment = table.ExecuteQuerySegmentedAsync(query, continuationToken);
TableQuerySegment<T> segment = await querySegment;
result.AddRange(segment.ToList());
continuationToken = segment.ContinuationToken;
} while (continuationToken != null);
return result;
}
Is it possible to create a parallel method that simply calls my existing method or will I have to create a parallel version of my existing method?
Per my understanding, you could reuse your existing method and execute your queries with multiple tasks as follows:
//for storing the query results
ConcurrentDictionary<string, object> resultDics = new ConcurrentDictionary<string, object>();
//simulate your seaching parameters
List<RowKeyStartsWithParamModel> rowKeySearchs = Enumerable.Range(1, 10)
.Select(i => new RowKeyStartsWithParamModel()
{
SearchString = i.ToString(),
TableName = "tablename",
ColumnName = "Rowkey",
ParationKey = "partionKey"
}).ToList();
//create multiple tasks to execute your jobs
var tasks = rowKeySearchs.Select(item => Task.Run(async () =>
{
//invoke your existing RowKeyStartsWith
var results=await RowKeyStartsWith<string>(item.SearchString, item.TableName, item.ParationKey, item.ColumnName);
//add retrieved results
resultDics.TryAdd(item.SearchString, results);
}));
//synchronously wait all tasks to be executed completely.
Task.WaitAll(tasks.ToArray());
//print all retrieved results
foreach (var item in resultDics)
{
Console.WriteLine($"{item.Key},{JsonConvert.SerializeObject(item.Value)}");
}
Moreover, you could leverage Parallel as follows:
Parallel.ForEach(rowKeySearchs, async(item) =>
{
var results = await RowKeyStartsWith<string>(item.SearchString, item.TableName, item.ParationKey, item.ColumnName);
resultDics.TryAdd(item.SearchString, results);
});
Note: Since you use await in the delegate for each iteration, you could not receive the query results synchronously after Parallel.ForEach.
In order to synchronously retrieve the results by using the above code snippet, you could leverage the following approaches:
1) Synchronously retrieve the results when invoking RowKeyStartsWith under each iteration of Parallel.ForEach as follows:
var results = RowKeyStartsWith<string>(item.SearchString, item.TableName, item.ParationKey, item.ColumnName).Result;
2) You could leverage WaitHandle to synchronously wait the query results until all WaitHandles completed.
var waitHandles = rowKeySearchs.Select(d => new EventWaitHandle(false, EventResetMode.ManualReset)).ToArray();
Parallel.ForEach(rowKeySearchs, async (item,loopState,index) =>
{
var results = await RowKeyStartsWith<string>(item.SearchString, item.TableName, item.ParationKey, item.ColumnName);
resultDics.TryAdd(item.SearchString, results);
waitHandles[index].Set(); //release
});
WaitHandle.WaitAll(waitHandles); //block the current thread until all EventWaitHandles released

Netsuite Transaction search performance

I am using Netsuite API (version v2016_2) to search data. With below code, it seems that Netsuite taking much time to give response for the query. I am searching GL transaction of periticular period that has 149 MainLine record and 3941 LineItem (Journal Entries) record and Netsuite takes almost 22 minutes to give this data in response. Below is code snippet that I am using to search transaction.
public void GetTransactionData()
{
DataTable dtData = new DataTable();
string errorMsg = "";
LoginToService(ref errorMsg);
TransactionSearch objTransSearch = new TransactionSearch();
TransactionSearchBasic objTransSearchBasic = new TransactionSearchBasic();
SearchEnumMultiSelectField semsf = new SearchEnumMultiSelectField();
semsf.#operator = SearchEnumMultiSelectFieldOperator.anyOf;
semsf.operatorSpecified = true;
semsf.searchValue = new string[] { "Journal" };
objTransSearchBasic.type = semsf;
objTransSearchBasic.postingPeriod = new RecordRef() { internalId = "43" };
objTransSearch.basic = objTransSearchBasic;
//Set Search Preferences
SearchPreferences _searchPreferences = new SearchPreferences();
Preferences _prefs = new Preferences();
_serviceInstance.preferences = _prefs;
_serviceInstance.searchPreferences = _searchPreferences;
_searchPreferences.pageSize = 1000;
_searchPreferences.pageSizeSpecified = true;
_searchPreferences.bodyFieldsOnly = false;
//Set Search Preferences
try
{
SearchResult result = _serviceInstance.search(objTransSearch);
/*
Above line taking almost 22 minutes for below record count
result.recordList.Length = 149
Total JournalEntryLine = 3941
*/
List<JournalEntry> lstJEntry = new List<JournalEntry>();
List<JournalEntryLine> lstLineItems = new List<JournalEntryLine>();
if (result.status.isSuccess)
{
for (int i = 0; i <= result.recordList.Length - 1; i += 1)
{
JournalEntry JEntry = (JournalEntry)result.recordList[i];
lstJEntry.Add((JournalEntry)result.recordList[i]);
if (JEntry.lineList != null)
{
foreach (JournalEntryLine line in JEntry.lineList.line)
{
lstLineItems.Add(line);
}
}
}
}
try
{
_serviceInstance.logout();
}
catch (Exception ex)
{
}
}
catch (Exception ex)
{
throw ex;
}
}
I am unable to know that If I am missing something in my code or this is something about the data. Please suggest me some sort of solution for this.
Thanks.
You should set _searchPreferences.bodyFieldsOnly = true. It will improve the performance with searching because it doesn't return the related or sublist data
I think you are doing this search from the outside of the Netsuite to get journal entries data or lines. Instead of doing a direct search outside, do maintain RESTLET in NETSUITE and call that RESTLET. In the RESTLET DO that search you wanted and return results. Within the NETSUITE, search performance gives fast results.

Resources