I'm trying to select some rows after x rows, something like:
SELECT * from collection WHERE ROWNUM >= 235 and ROWNUM <= 250
Unfortunately it looks like ROWNUM isn't resolved in azure cosmos db.
Is there another way to do this? I've looked at using continuation tokens but it's not helpful if a user skips to page 50, would I need to keep querying with continuation tokens to get to page 50?
I've tried playing around with the page size option but that has some limitations in terms of how many things it can return at any one time.
For example I have 1,000,000 records in Azure. I want to query rows
500,000 to 500,010. I can't do SELECT * from collection WHERE ROWNUM >= 500,000 and ROWNUM <= 500,010 so how do I achieve this?
If you don't have any filters, you can't retrieve items in specific range via query sql direcly in cosmos db so far. So, you need to use pagination to locate your desire items. As I know, pagination is supported based on continuation token only so far.
Please refer to the function as below:
using JayGongDocumentDB.pojo;
using Microsoft.Azure.Documents.Client;
using Microsoft.Azure.Documents.Linq;
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading.Tasks;
namespace JayGongDocumentDB.module
{
class QuerySample1
{
public static async void QueryPageByPage()
{
// Number of documents per page
const int PAGE_SIZE = 2;
int currentPageNumber = 1;
int documentNumber = 1;
// Continuation token for subsequent queries (NULL for the very first request/page)
string continuationToken = null;
do
{
Console.WriteLine($"----- PAGE {currentPageNumber} -----");
// Loads ALL documents for the current page
KeyValuePair<string, IEnumerable<Student>> currentPage = await QueryDocumentsByPage(currentPageNumber, PAGE_SIZE, continuationToken);
foreach (Student student in currentPage.Value)
{
Console.WriteLine($"[{documentNumber}] {student.Name}");
documentNumber++;
}
// Ensure the continuation token is kept for the next page query execution
continuationToken = currentPage.Key;
currentPageNumber++;
} while (continuationToken != null);
Console.WriteLine("\n--- END: Finished Querying ALL Dcuments ---");
}
public static async Task<KeyValuePair<string, IEnumerable<Student>>> QueryDocumentsByPage(int pageNumber, int pageSize, string continuationToken)
{
DocumentClient documentClient = new DocumentClient(new Uri("https://***.documents.azure.com:443/"), "***");
var feedOptions = new FeedOptions
{
MaxItemCount = pageSize,
EnableCrossPartitionQuery = true,
// IMPORTANT: Set the continuation token (NULL for the first ever request/page)
RequestContinuation = continuationToken
};
IQueryable<Student> filter = documentClient.CreateDocumentQuery<Student>("dbs/db/colls/item", feedOptions);
IDocumentQuery<Student> query = filter.AsDocumentQuery();
FeedResponse<Student> feedRespose = await query.ExecuteNextAsync<Student>();
List<Student> documents = new List<Student>();
foreach (Student t in feedRespose)
{
documents.Add(t);
}
// IMPORTANT: Ensure the continuation token is kept for the next requests
return new KeyValuePair<string, IEnumerable<Student>>(feedRespose.ResponseContinuation, documents);
}
}
}
Output:
Hope it helps you.
Update Answer:
No such function like ROW_NUMBER() [How do I use ROW_NUMBER()? ] in cosmos db so far. I also thought skip and top.However, top is supported and skip yet(feedback).It seems skip is already in processing and will be released in the future.
I think you could push the feedback related to the paging function.Or just take above continuation token as workaround temporarily.
Related
I want to specify the number of records to return from the database. The only way I've come up with that allows me to do that, is by setting the MaxItemCount = 1, so that each feed response contains only one result, and then readNext from the iterator the wanted number of times. I don't know enough about RUs and what goes on behind the scenes with cosmos db, but I suspect setting MaxItemCount = 1 is bad practice. So is there any other way to do this?
public async Task<(IEnumerable<T> Results, string ContinuationToken)> ReadAsync(Expression<Func<T, bool>> predicate, int limit, string continuationToken)
{
var items = new List<T>();
var options = new QueryRequestOptions { MaxItemCount = 1 };
IQueryable<T> query = _container.GetItemLinqQueryable<T>(true, continuationToken, options).Where(predicate);
using (var iterator = query.ToFeedIterator())
{
while (iterator.HasMoreResults && (limit == -1 || entityList.Count < limit))
{
FeedResponse<T> response = await iterator.ReadNextAsync();
items.AddRange(response.Resource);
continuationToken = response.ContinuationToken;
}
}
return (items, continuationToken);
}
Regarding pagination, your approach is correct. What you could do is maintain an internal records counter and adjust it based on the number of documents received.
Here's the pseudo code to do so:
public async Task<(IEnumerable<T> Results, string ContinuationToken)> ReadAsync(Expression<Func<T, bool>> predicate, int limit, string continuationToken)
{
var items = new List<T>();
int itemsRemaining = limit//Let's say you want to fetch a finite number of items.
do
{
maxItemsCount = Math.Min(itemsRemaining, 100);//assuming you want to fetch a minimum of 100 items at a time
var options = new QueryRequestOptions { MaxItemCount = maxItemsCount };
IQueryable<T> query = _container.GetItemLinqQueryable<T>(true, continuationToken, options).Where(predicate);
using (var iterator = query.ToFeedIterator())
{
while (iterator.HasMoreResults)
{
FeedResponse<T> response = await iterator.ReadNextAsync();
items.AddRange(response.Resource);
continuationToken = response.ContinuationToken;
}
}
itemsRemaining -= items.Count;
}
while (itemsRemaining > 0);
return (items, continuationToken);
}
In the case where you wish to fetch 101 records, your code will iterate at least twice. First time it will fetch 100 records and the next time it will only fetch 1 record (101 - 100).
What are the limits of calling ExecuteQuery()? For example, limits on the number of entities and download size.
In other words, when will the method below hit its limits?
private static void ExecuteSimpleQuery(CloudTable table, string partitionKey, string startRowKey, string endRowKey)
{
try
{
// Create the range query using the fluid API
TableQuery<CustomerEntity> rangeQuery = new TableQuery<CustomerEntity>().Where(
TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionKey),
TableOperators.And,
TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.GreaterThanOrEqual, startRowKey),
TableOperators.And,
TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.LessThanOrEqual, endRowKey))));
foreach (CustomerEntity entity in table.ExecuteQuery(rangeQuery))
{
Console.WriteLine("Customer: {0},{1}\t{2}\t{3}", entity.PartitionKey, entity.RowKey, entity.Email, entity.PhoneNumber);
}
}
catch (StorageException e)
{
Console.WriteLine(e.Message);
Console.ReadLine();
throw;
}
}
The method below is using ExecuteQuerySegmentedAsync with TakeCount of 50, but How the 50 is determined, which is I think determined by the my questions above.
private static async Task PartitionRangeQueryAsync(CloudTable table, string partitionKey, string startRowKey, string endRowKey)
{
try
{
// Create the range query using the fluid API
TableQuery<CustomerEntity> rangeQuery = new TableQuery<CustomerEntity>().Where(
TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, partitionKey),
TableOperators.And,
TableQuery.CombineFilters(
TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.GreaterThanOrEqual, startRowKey),
TableOperators.And,
TableQuery.GenerateFilterCondition("RowKey", QueryComparisons.LessThanOrEqual, endRowKey))));
// Request 50 results at a time from the server.
TableContinuationToken token = null;
rangeQuery.TakeCount = 50;
int segmentNumber = 0;
do
{
// Execute the query, passing in the continuation token.
// The first time this method is called, the continuation token is null. If there are more results, the call
// populates the continuation token for use in the next call.
TableQuerySegment<CustomerEntity> segment = await table.ExecuteQuerySegmentedAsync(rangeQuery, token);
// Indicate which segment is being displayed
if (segment.Results.Count > 0)
{
segmentNumber++;
Console.WriteLine();
Console.WriteLine("Segment {0}", segmentNumber);
}
// Save the continuation token for the next call to ExecuteQuerySegmentedAsync
token = segment.ContinuationToken;
// Write out the properties for each entity returned.
foreach (CustomerEntity entity in segment)
{
Console.WriteLine("\t Customer: {0},{1}\t{2}\t{3}", entity.PartitionKey, entity.RowKey, entity.Email, entity.PhoneNumber);
}
Console.WriteLine();
}
while (token != null);
}
catch (StorageException e)
{
Console.WriteLine(e.Message);
Console.ReadLine();
throw;
}
}
Examples are from the link below:
https://github.com/Azure-Samples/storage-table-dotnet-getting-started
For ExecuteQuerySegmentedAsync, the limit is 1000. This is based on the limitation posed by REST API where a single request to table service can return a maximum of 1000 entities (Ref: https://learn.microsoft.com/en-us/rest/api/storageservices/query-timeout-and-pagination).
ExecuteQuery method will try to return all entities matching a query. Internally it tries to fetch a maximum of 1000 entities in a single iteration and will try to fetch next set of entities if the response from table service includes a continuation token.
UPDATE
If ExecuteQuery performs pagination automatically, it seems it is
easier to use than ExecuteQuerySegmentedAsync. Why must I use
ExecuteQuerySegmentedAsync? What about download size? 1000 entities
regardless their sizes?
With ExecuteQuery, there's no way for you to break out of the loop. This becomes problematic when you have a lot of entities in a table. You have that flexibility with ExecuteQuerySegmentedAsync. For example, let's assume you want to download all entities from a very large table and save them locally. If you use ExecuteQuerySegmentedAsync, you can save the entities in different files.
Regarding your comment about 1000 entities regardless of the size, the answer is yes. Please keep in mind that maximum size of each entity can be 1MB.
Basically I am trying to get pagination working when requesting entities of azure table storage. i.e. Press next button gets the next 10 entities & Press previous button gets the previous 10 entities. A relatively close example Gaurav Mantri's Answer. But my question is how do I get the nextPartitionKey and nextRowKey from a HTML button attribute and store in to a array/list in order to keep track of current page so I can get the next/previous items?Code example would be very appreciated.
Thanks!
This is something I have right now which gets a range of data based on pageNumber request
private async Task<List<UserInfo>> queryPage(CloudTable peopleTable, string item, int pageNumber)
{
// Construct the query operation for all customer entities
TableQuery<CustomerEntity> query = new TableQuery<CustomerEntity>().Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, item));
// Print the fields for each customer.
TableContinuationToken token = null;
//TodoItem data = new TodoItem();
List<UserInfo> data = new List<UserInfo>();
do
{
TableQuerySegment<CustomerEntity> resultSegment = await peopleTable.ExecuteQuerySegmentedAsync(query, token);
token = resultSegment.ContinuationToken;
foreach (CustomerEntity entity in resultSegment.Results)
{
data.Add(new UserInfo
{
// add data
});
}
} while (token != null);
//get a subset of all entity
List<UserInfo> sublist = data.GetRange(0, pageNumber);
return sublist;
}
Managed to solved the problem under Gaurav's help.
Here is the code, not perfect but works.
private async Task<List<UserInfo>> queryPage(CloudTable peopleTable, string item, string NextPartitionKey , string NextRowKey, int itemNumber)
{
// Construct the query operation for all customer entities
TableQuery<CustomerEntity> query = new TableQuery<CustomerEntity>().Where(TableQuery.GenerateFilterCondition("PartitionKey", QueryComparisons.Equal, item)).Take(itemNumber);
// Print the fields for each customer.
List<UserInfo> data = new List<UserInfo>();
Tabletoken.NextPartitionKey = NextPartitionKey;
Tabletoken.NextRowKey = NextRowKey;
TableQuerySegment<CustomerEntity> resultSegment = await peopleTable.ExecuteQuerySegmentedAsync(query, Tabletoken);
Tabletoken = resultSegment.ContinuationToken;
foreach (CustomerEntity entity in resultSegment.Results)
{
data.Add(new UserInfo
{
//add data
});
}
return data;
}
private TableContinuationToken Tabletoken = new TableContinuationToken();
and declare it use a tuple.
Tuple<List<UserInfo>, string, string > tuple =
new Tuple<List<UserInfo>, string, string>(data, Tabletoken.NextPartitionKey, Tabletoken.NextRowKey);
After upgrading to the new storage API version 4.2, I'm getting the following warning that I'm calling obsolete methods on some of my segmented queries.
'Microsoft.WindowsAzure.Storage.Table.CloudTableClient.GetTableServiceContext()'
is obsolete: 'Support for accessing Windows Azure Tables via WCF Data
Services is now obsolete. It's recommended that you use the
Microsoft.WindowsAzure.Storage.Table namespace for working with
tables.'
So far I haven't been able to figure out how to achieve this on the new API, and no examples have been put out that I have been able to find. The legacy code still runs fine, but if the new API supports something better I'd love to check it out and get rid of this warning. Could someone point me in the right direction on how a segmented query like this would look using the new API?
Here is what my code currently looks like with the warning:
public AzureTablePage<T> GetPagedResults<T>(Expression<Func<T, bool>> whereCondition, string ContinuationToken, int PageSize, string TableName) {
TableContinuationToken token = GetToken(ContinuationToken);
var query = AzureTableService.CreateQuery<T>(TableName).Where(whereCondition).Take(PageSize).AsTableServiceQuery(AzureTableClient.GetTableServiceContext());
var results = query.ExecuteSegmented(token, new TableRequestOptions() { PayloadFormat = TablePayloadFormat.JsonNoMetadata });
if (results.ContinuationToken != null) {
return new AzureTablePage<T>() { Results = results.ToList(), HasMoreResults = true, ContinuationToken = string.Join("|", results.ContinuationToken.NextPartitionKey, results.ContinuationToken.NextRowKey) };
} else {
return new AzureTablePage<T>() { Results = results.ToList(), HasMoreResults = false };
}
}
public TableServiceContext AzureTableService {
get {
var context = AzureTableClient.GetTableServiceContext();
context.IgnoreResourceNotFoundException = true;
return context;
}
}
public CloudTableClient AzureTableClient {
get {
return mStorageAccount.CreateCloudTableClient();
}
}
Solution
For anyone with the same question, here is the updated code.
/* Add the following Using Statement */
using Microsoft.WindowsAzure.Storage.Table.Queryable;
public AzureTablePage<T> GetPagedResults<T>(Expression<Func<T, bool>> whereCondition, string ContinuationToken, int PageSize, string TableName) where T : class, ITableEntity, new() {
TableContinuationToken token = GetToken(ContinuationToken);
var query = AzureTableClient.GetTableReference(TableName).CreateQuery<T>().Where(whereCondition).Take(PageSize).AsTableQuery();
var results = query.ExecuteSegmented(token, new TableRequestOptions() { PayloadFormat = TablePayloadFormat.JsonNoMetadata });
if (results.ContinuationToken != null) {
return new AzureTablePage<T>() { Results = results.ToList(), HasMoreResults = true, ContinuationToken = string.Join("|", results.ContinuationToken.NextPartitionKey, results.ContinuationToken.NextRowKey) };
} else {
return new AzureTablePage<T>() { Results = results.ToList(), HasMoreResults = false };
}
}
Please see the Tables Deep Dive blog post that we published when we first introduced the new Table Service Layer. If you need LINQ support, please also see the Azure Storage Client Library 2.1 blog post.
We strongly recommend upgrading to Table Service Layer, because it is optimized for NoSQL scenarios and therefore provides much better performance.
I wrote the following function to get the SharePointDocumentLocation records regarding an account or contact. However, even though I provide an id which most definitely has got a SPDL record associated the result of a count on the EntityCollection that is returned is alway 0. Why does my query not return SPDL records?
internal static EntityCollection GetSPDocumentLocation(IOrganizationService service, Guid id)
{
SharePointDocumentLocation spd = new SharePointDocumentLocation();
QueryExpression query = new QueryExpression
{
EntityName = "sharepointdocumentlocation",
ColumnSet = new ColumnSet("sharepointdocumentlocationid"),
Criteria = new FilterExpression
{
Conditions =
{
new ConditionExpression
{
AttributeName = "regardingobjectid",
Operator = ConditionOperator.Equal,
Values = { id }
}
}
}
};
return service.RetrieveMultiple(query);
}
The following code does work
using System;
using Microsoft.Xrm.Sdk;
using Microsoft.Xrm.Sdk.Client;
using System.ServiceModel.Description;
using System.Net;
using Microsoft.Xrm.Sdk.Query;
namespace CRMConsoleTests
{
class Program
{
static void Main(string[] args)
{
ClientCredentials credentials = new ClientCredentials();
credentials.Windows.ClientCredential = CredentialCache.DefaultNetworkCredentials;
Uri orgUri = new Uri("http://localhost/CRMDEV2/XRMServices/2011/Organization.svc");
Uri homeRealmUri = null;
using (OrganizationServiceProxy service = new OrganizationServiceProxy(orgUri, homeRealmUri, credentials, null))
{
//ConditionExpression ce = new ConditionExpression("regardingobjectid", ConditionOperator.Equal, new Guid(""));
QueryExpression qe = new QueryExpression("sharepointdocumentlocation");
qe.ColumnSet = new ColumnSet(new String[] { "sharepointdocumentlocationid", "regardingobjectid" });
//qe.Criteria.AddCondition(ce);
EntityCollection result = service.RetrieveMultiple(qe);
foreach (Entity entity in result.Entities)
{
Console.WriteLine("Results for the first record: ");
SharePointDocumentLocation spd = entity.ToEntity<SharePointDocumentLocation>();
if (spd.RegardingObjectId != null)
{
Console.WriteLine("Id: " + spd.SharePointDocumentLocationId.ToString() + " with RoId: " + spd.RegardingObjectId.Id.ToString());
}
}
Console.ReadLine();
}
}
}
}
It retrieves 4 records, and when I debug the plugincode above it retrieves 3 records.
Everything looks good with your QueryExpression, although I'd write it a little more concise (something like this):
var qe = new QueryExpression(SharePointDocumentLocation.EntityLogicalName){
ColmnSet = new ColumnSet("sharepointdocumentlocationid"),
};
qe.Criteria.AddCondition("regardingobjectid", ConditionOperator.Equal, id);
Because I don't see anything wrong with the QueryExpression that leads me with two guesses.
You're using impersonation on the IOrganizationService and the impersonated user doesn't have rights to the SharePointDocumentLocation. You won't get an error, you just won't get any records returned.
The id you're passing in is incorrect.
I'd remove the Criteria and see how many records you get back. If you don't get all of the records back, you know your issue is with guess #1.
If you get all records, add the regardingobjectid to the ColumnSet and retrieve the first record without any Criteria in the QueryExpression, then call this method passing in the id of the regardingobject you returned. If nothing is received when adding the regardingobjectid constraint, then something else is wrong.
Update
Since this is executing within the delete of the plugin, it must be performing its cascade deletes before your plugin is firing. You can try the Pre-Validation.
Now that I think of it, it must perform the deletion of the cascading entities in the Validation stage, because if one of them is unable to be deleted, the entity itself can't be deleted.