Cant get all items from Azure Storage Table

Cant get all items from Azure Storage Table - azure

I cant get more that 1000 items from a Azure Storage Table. Ive tried adding these headers as per this link:
OData-MaxVersion: 4.0
OData-Version: 4.0
Prefer: odata.maxpagesize=3
Ive also looked for a returning #odata.nextLink property to get next page of items but none is returned. The table has around 2050 items.

This is by default and has been added for performance optimization.
A query against the Table service may return a maximum of 1,000 items at one time and may execute for a maximum of five seconds. If the result set contains more than 1,000 items, if the query did not complete within five seconds, or if the query crosses the partition boundary, the response includes headers which provide the developer with continuation tokens to use in order to resume the query at the next item in the result set. Continuation token headers may be returned for a Query Tables operation or a Query Entities operation.
See Query timeout and pagination on how to use continuation token headers to get the next set of results.
One important extra piece of information:
A request that returns more than the default maximum or specified maximum number of results returns a continuation token for performing pagination. When making subsequent requests that include continuation tokens, be sure to pass the original URI on the request. For example, if you have specified a $filter, $select, or $top query option as part of the original request, you will want to include that option on subsequent requests. Otherwise your subsequent requests may return unexpected results.

Related

Azure Table Storage TableContinuationToken NextTableName purpose

It is sometimes said that when using Azure Tables there is effectively a 3rd key partitioning data - the Table Name itself.
I noticed when executing a Segmented query that the TableContinuationToken has a NextTableName property. What is the purpose of this property? It could be useful if a query could span multiple tables?

It's for segmented queries if the full result can't be returned by a response.
Blockquote from https://learn.microsoft.com/en-us/rest/api/storageservices/query-tables :
A query against the Table service may return a maximum of 1,000 tables at one time and may execute for a maximum of five seconds. If the result set contains more than 1,000 tables, if the query did not complete within five seconds, or if the query crosses the partition boundary, the response includes a custom header containing the x-ms-continuation-NextTableName continuation token. The continuation token may be used to construct a subsequent request for the next page of data. For more information about continuation tokens, see Query Timeout and Pagination.

Azure Table Storage API returns 0 results with Continuation Token

We are using .net Azure storage client library to retrieve data from server. But when we try to retrieve data, the result have only 0 items with a continuation token. When we fetch the next page with this continuation token we again gets the same result. However when we use the 4th continuation token fetched like this, we are getting the proper result with 15 items.( The items count for all requests are 15). This issue is observed only when we tried applying filter conditions. The code used to fetch result is given below
var tableReference = _tableClient.GetTableReference(tableName);
var query = new TableQuery();
query.Where("'DeviceId' eq '99'"); // DeviceId is of type Int32
query.TakeCount = 15;
var resultsQuery = tableReference.ExecuteQuerySegmented(query, token);
var nextToken = resultsQuery.ContinuationToken;
var results = resultsQuery.ToList();

This is expected behavior. From Query Timeout and Pagination:
A query against the Table service may return a maximum of 1,000 items
at one time and may execute for a maximum of five seconds. If the
result set contains more than 1,000 items, if the query did not
complete within five seconds, or if the query crosses the partition
boundary, the response includes headers which provide the developer
with continuation tokens to use in order to resume the query at the
next item in the result set. Continuation token headers may be
returned for a Query Tables operation or a Query Entities operation.
I noticed that you're not using PartitionKey in your query. This will result in full table scan. Recommendation would be to always use PartitionKey (and possibly RowKey) in your queries to avoid full table scans. I would highly recommend reading Azure Storage Table Design Guide: Designing Scalable and Performant Tables to get the most out of Azure Tables.
UPDATE: Explaining "If the query crosses the partition boundary"
Let me try with an example as to what I understand by Partition Bounday. Let's assume you have 1 million rows in your table evenly spread across 10 Partitions (let's assume your PartitionKeys are 001, 002, 003,...010). Now we know that the data in Azure Tables is organized by PartitionKey and then in a Partition by RowKey. Since in your query you did not specify PartitionKey, Table Service starts from 1st Partition (i.e. PartitionKey == 001) and tries to find the matching data there. If it does not find any data in that Partition, it does not know whether the data is there in another Partition so instead of going to the next Partition, it simply returns back with a continuation token and leave it to the client consuming the API to decide whether they want to continue the search using the same parameters + continuation token or revise their search to start again.

Azure Table query returns empty result with not null continuation token

As it said there https://msdn.microsoft.com/en-us/library/azure/dd135718.aspx
"It is possible for a query to return no results but to still return a continuation header."
So my question is - what then should the behaviour of caller be?
Retry again after some time?
Consider it as end of the results set?
Make new query without cont. token updating filters based on the last data retrieved?
It is also said,
"A query against the Table service may return a maximum of 1,000 items at one time and may execute for a maximum of five seconds. If the result set contains more than 1,000 items, if the query did not complete within five seconds, or if the query crosses the partition boundary, the response includes headers which provide the developer with continuation tokens to use in order to resume the query at the next item in the result set. Continuation token headers may be returned for a Query Tables operation or a Query Entities operation."
So it looks like the retry strategy can lead us into infinite loop when empty results with continuation token is always returned...

You should immediately build and use the next query and pass the continuation token in it. As described here: Query Timeout and Pagination. The table storage will be searched beginning from the position where the previous request ended. You got an empty result because the table storage didn´t find any matching data within the five seconds, but there are still data left to search.
After you have retrieved the continuation tokens, use their values to construct a query to return the next page of results.
If you use .NET and the assembly Microsoft.WindowsAzure.Storage there is a BeginExecuteQuerySegmented method which builds all the requests for you. Example: https://stackoverflow.com/a/13428086/1051244

How to paginate requests for large data sets in OData from PowerQuery?

I have an OData feed that contains a number of large tables (tens of millions of rows). I need to configure PowerQuery (or PowerPivot, whichever is the best tool for the job) to access this OData feed, but to do so in a paginated way so that a single request doesn't try to return 10 million rows all at once, but instead builds up the complete result of tens of millions of rows with multiple paginated queries. I don't want to have to manually submit many different URLs with different values of $top and $skip to do my own manual pagination, instead I need PowerQuery or PowerPivot to handle the pagination for me.
I was hoping that PQ/PP would just be smart enough to do pagination, perhaps by first issuing a "count" query to determine how many rows are present, but this appears not to be the case. When I give PQ/PP a URL to a large OData table, it just blindly issues a query to retrieve all rows (actually, it issues 2 such identical queries, which seems odd), which crashes the DB on the server.
In searching for an answer, I've seen hints that PQ/PP can do pagination, but no clue as to how to enable this behavior. So is there a way to tell PQ/PP to use some kind of pagination to access large data sets? If so, can I set the page size?

Can you put the PageSize on the EnableQueryAttribute if you are using Web API.
[EnableQuery(PageSize = 10)]
public IHttpActionResult Get()
{
return Ok(customers);
}

You can use recursion to fetch and append successive pages. Each successive fetch uses a higher "start" line number in the url. And the recursion ends if the fetch yields an empty list. In "M" the if statement can check for empty list otherwise append and increment then "#" to self reference your current function name.

CloudTable.ExecuteQuery : how to get number of transactions

AFAIK the ExecuteQuery handles segmented queries internally. And every request (=call to storage) for a segment counts as a transaction to the storage (I mean billing transaction - http://www.windowsazure.com/en-us/pricing/details/storage/). Right?
Is there a way to understand in how many segments is splitted a query? If I run a query and I get 5000 items, I can suppose that my query was splitted into 5 segments (due to the limit of 1000 items per segment). But in case of a complex query there is also the timeout of 5 seconds per call.

I don't believe there's a way to get at that in the API. You could set up an HTTPS proxy to log the requests, if you just want to check it in development.
If it's really important that you know, use the BeginExecuteSegmented and EndExeceuteSegmented calls instead. Your code will get a callback for each segment, so you can easily track how many calls there are.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string