It is sometimes said that when using Azure Tables there is effectively a 3rd key partitioning data - the Table Name itself.
I noticed when executing a Segmented query that the TableContinuationToken has a NextTableName property. What is the purpose of this property? It could be useful if a query could span multiple tables?
It's for segmented queries if the full result can't be returned by a response.
Blockquote from https://learn.microsoft.com/en-us/rest/api/storageservices/query-tables :
A query against the Table service may return a maximum of 1,000 tables at one time and may execute for a maximum of five seconds. If the result set contains more than 1,000 tables, if the query did not complete within five seconds, or if the query crosses the partition boundary, the response includes a custom header containing the x-ms-continuation-NextTableName continuation token. The continuation token may be used to construct a subsequent request for the next page of data. For more information about continuation tokens, see Query Timeout and Pagination.
Related
The query from logic apps is not searching across all partitions in CosmosDB when the partition key value field is empty. When the exact same query is run from data explorer all partitions are queried and all expected data is returned. When the query is run from logic apps using log analytics I can see that a query is only being run for 1 partition range, and not all the expected results are returned. (some are, from the partitions that were hit)
From the docs for partition key value: Value must be provided according to its type ("string", 42, 0.5). If empty, all partitions will be used to search for documents.
Expected function would be logic apps Query documents V5 connector to return the same results from a SQL query as run in CosmosDB data explorer
Query documents v5 can return an empty array in response to a query in CosmosDB, but the intended behaviour is for the user to check if the response has a continuation token, implying the query should be run again, passing in the continuation token as a header for the new request. This needs to be done to retrieve all the data.
Copy pasting a comment from another thread:
When executing queries using the REST API, make sure you are consuming and verifying the x-ms-continuation header. Reference: https://learn.microsoft.com/rest/api/cosmos-db/common-cosmosdb-rest-response-headers
You can iterate (send more requests) until the Continuation Token is not being returned in the response.
I cant get more that 1000 items from a Azure Storage Table. Ive tried adding these headers as per this link:
OData-MaxVersion: 4.0
OData-Version: 4.0
Prefer: odata.maxpagesize=3
Ive also looked for a returning #odata.nextLink property to get next page of items but none is returned. The table has around 2050 items.
This is by default and has been added for performance optimization.
A query against the Table service may return a maximum of 1,000 items at one time and may execute for a maximum of five seconds. If the result set contains more than 1,000 items, if the query did not complete within five seconds, or if the query crosses the partition boundary, the response includes headers which provide the developer with continuation tokens to use in order to resume the query at the next item in the result set. Continuation token headers may be returned for a Query Tables operation or a Query Entities operation.
See Query timeout and pagination on how to use continuation token headers to get the next set of results.
One important extra piece of information:
A request that returns more than the default maximum or specified maximum number of results returns a continuation token for performing pagination. When making subsequent requests that include continuation tokens, be sure to pass the original URI on the request. For example, if you have specified a $filter, $select, or $top query option as part of the original request, you will want to include that option on subsequent requests. Otherwise your subsequent requests may return unexpected results.
I have inserted exactly 1 million documents in an Azure Cosmos DB SQL container using the Bulk Executor. No errors were logged. All documents share the same partition key. The container is provisioned for 3,200 RU/s, unlimited storage capacity and single-region write.
When performing a simple count query:
select value count(1) from c where c.partitionKey = #partitionKey
I get varying results varying from 303,000 to 307,000.
This count query works fine for smaller partitions (from 10k up to 250k documents).
What could cause this strange behavior?
It's reasonable in cosmos db. Firstly, what you need to know is that Document DB imposes limits on Response page size. This link summarizes some of those limits: Azure DocumentDb Storage Limits - what exactly do they mean?
Secondly, if you want to query large data from Document DB, you have to consider the query performance issue, please refer to this article:Tuning query performance with Azure Cosmos DB.
By looking at the Document DB REST API, you can observe several important parameters which has a significant impact on query operations : x-ms-max-item-count, x-ms-continuation.
So, your error is resulted of bottleneck of RUs setting. The count query is limited by the number for RUs allocated to your collection. The result that you would have received will have a continuation token.
You may have 2 solutions:
1.Surely, you could raise the RUs setting.
2.For cost, you could keep looking for next set of results via continuation token and keep on adding it so that you will get total count.(Probably in sdk)
You could set value of Max Item Count and paginate your data using continuation tokens. The Document Db sdk supports reading paginated data seamlessly. You could refer to the snippet of python code as below:
q = client.QueryDocuments(collection_link, query, {'maxItemCount':10})
results_1 = q._fetch_function({'maxItemCount':10})
#this is a string representing a JSON object
token = results_1[1]['x-ms-continuation']
results_2 = q._fetch_function({'maxItemCount':10,'continuation':token})
I imported exactly 30k documents into my database.Then I tried to run the query
select value count(1) from c in Query Explorer. It turns out only partial of total documents every page. So I need to add them all by clicking Next Page button.
Surely, you could do this query in the sdk code via continuation token.
We are using .net Azure storage client library to retrieve data from server. But when we try to retrieve data, the result have only 0 items with a continuation token. When we fetch the next page with this continuation token we again gets the same result. However when we use the 4th continuation token fetched like this, we are getting the proper result with 15 items.( The items count for all requests are 15). This issue is observed only when we tried applying filter conditions. The code used to fetch result is given below
var tableReference = _tableClient.GetTableReference(tableName);
var query = new TableQuery();
query.Where("'DeviceId' eq '99'"); // DeviceId is of type Int32
query.TakeCount = 15;
var resultsQuery = tableReference.ExecuteQuerySegmented(query, token);
var nextToken = resultsQuery.ContinuationToken;
var results = resultsQuery.ToList();
This is expected behavior. From Query Timeout and Pagination:
A query against the Table service may return a maximum of 1,000 items
at one time and may execute for a maximum of five seconds. If the
result set contains more than 1,000 items, if the query did not
complete within five seconds, or if the query crosses the partition
boundary, the response includes headers which provide the developer
with continuation tokens to use in order to resume the query at the
next item in the result set. Continuation token headers may be
returned for a Query Tables operation or a Query Entities operation.
I noticed that you're not using PartitionKey in your query. This will result in full table scan. Recommendation would be to always use PartitionKey (and possibly RowKey) in your queries to avoid full table scans. I would highly recommend reading Azure Storage Table Design Guide: Designing Scalable and Performant Tables to get the most out of Azure Tables.
UPDATE: Explaining "If the query crosses the partition boundary"
Let me try with an example as to what I understand by Partition Bounday. Let's assume you have 1 million rows in your table evenly spread across 10 Partitions (let's assume your PartitionKeys are 001, 002, 003,...010). Now we know that the data in Azure Tables is organized by PartitionKey and then in a Partition by RowKey. Since in your query you did not specify PartitionKey, Table Service starts from 1st Partition (i.e. PartitionKey == 001) and tries to find the matching data there. If it does not find any data in that Partition, it does not know whether the data is there in another Partition so instead of going to the next Partition, it simply returns back with a continuation token and leave it to the client consuming the API to decide whether they want to continue the search using the same parameters + continuation token or revise their search to start again.
As it said there https://msdn.microsoft.com/en-us/library/azure/dd135718.aspx
"It is possible for a query to return no results but to still return a continuation header."
So my question is - what then should the behaviour of caller be?
Retry again after some time?
Consider it as end of the results set?
Make new query without cont. token updating filters based on the last data retrieved?
It is also said,
"A query against the Table service may return a maximum of 1,000 items at one time and may execute for a maximum of five seconds. If the result set contains more than 1,000 items, if the query did not complete within five seconds, or if the query crosses the partition boundary, the response includes headers which provide the developer with continuation tokens to use in order to resume the query at the next item in the result set. Continuation token headers may be returned for a Query Tables operation or a Query Entities operation."
So it looks like the retry strategy can lead us into infinite loop when empty results with continuation token is always returned...
You should immediately build and use the next query and pass the continuation token in it. As described here: Query Timeout and Pagination. The table storage will be searched beginning from the position where the previous request ended. You got an empty result because the table storage didnĀ“t find any matching data within the five seconds, but there are still data left to search.
After you have retrieved the continuation tokens, use their values to construct a query to return the next page of results.
If you use .NET and the assembly Microsoft.WindowsAzure.Storage there is a BeginExecuteQuerySegmented method which builds all the requests for you. Example: https://stackoverflow.com/a/13428086/1051244