Azure CosmosDB SQL Record counts - azure

I have a CosmosDB Collection which I'm querying using the REST API.
I'd like to access the total number of documents which match my query. I know I can do a count, but that means two calls, one for the count and a subsequent one to retrieve the actual records.
I would assume this is not possible in a single call, BUT.. the Data Explorer in Azure Portal seems to manage it, so just wondering if anyone has been able to figure out what calls it makes, to get this:
Showing Results 1 - 10
Retrieved document count 342
Retrieved document size 2868425 bytes
Output document count 10
It's the Retrieved Document Count I need - if the portal can do it, there ought to be a way :)
I've tried the JAVA SDK as well as REST but can't see any useful options in there either

As so often is the case in this game, asking a question triggers the answer... so apologies in advance.
The answer is to send the x-ms-documentdb-populatequerymetrics header in the request.
The response then gives a whole bunch of useful stuff in x-ms-documentdb-query-metrics.
What I would like to understand still is whether this has any performance impact?

Related

Pagination for results generated on hitting salesforce API by third party system

I am trying to implement pagination with the below scenario:
Third party system is hitting salesforce API and returning the result from an object in salesforce.
They want to get the results paginated and get the response from salesforce by passing below parameters from their end:
PageIndex/No
PageSize
I dont necessarily have to display the records using VisualForce/LWC,etc.Just the paginated records need to be passed back to 3rd party system.
All the resources I found on the web employ using some VF page,component,etc.If there is a necessity of using the same for implementing this pagination,please do let me know that as well.
Tried looking for resources where pagination can be implemented but the resources involve using a VF page,lightning component,etc
I expected: A simple pagination on the records being returned from the salesforce webservice
For smaller data sets you can use SOQL's LIMIT and OFFSET. You can offset (skip) up to 2000 records that way.
More general solution would be to use salesforce's "query locators" (bit like CURSOR statement in a normal database). You need to read up about queryMore() (if you use SOAP API) to get next chunk of data (no jumping around with page number, just next, next...)
REST API has similar nextRecordsUrl: https://stackoverflow.com/a/56448825/313628
If they can't implement that / you don't want to give access to normal API, just some Apex... at most you can query 50K records in 1 transaction so you could make-do something that way. You'll likely need to put some rules around fixed order of records

Get all `Facets` from azure search without item result

Hi all I'm facing performance issues with azure cognitive search currently I have 956 Facets filed.
When I load Documents from Azure server it's taking almost 30 to 35 seconds.
But when I remove Facets from Azure search request Documents load in 2 to 3 seconds.
So for this, I have created 2 API's
First API load Document result from the azure server.
Second API load all Facets from the azure server.
Is there any way to load only Facets?
Code get the document from the azure server.
DocumentSearchResult<AzureSearchItem> results = null;
ISearchFilterResult searchResult = DependencyResolver.Current.GetService<ISearchFilterResult>();
WriteToFile("Initiate request call for search result ProcessAzureSearch {0}");
results = searchServiceClient.Documents.Search<AzureSearchItem>(searchWord, parameters);
WriteToFile("Response received for search result {0}");
Faceting is an aggregation operation that's performed over the matching results and is quite intensive when there are a lot of distinct buckets. I can't comment on the specific increase in latency but adding facets to the query definitely has a performance impact.
Since faceting computes aggregation on matching documents, it has to run the query in the backend but as Gaurav mentioned, specifying top = 0 will prevent the actual retrieval as it doesn't need to be included in the response. This could improve the performance especially if the individual docs are large.
Another possibility is to run just the query first and then use a identifier field to filter the docs with facets. Since filtering is faster than querying, the overall latency should improve. This only works if you're able to identify the id groups for the resultant docs from the 1st API call.
In general I'd recommend using facets judiciously and re-evaluate the design if there is a need to run faceting queries on a field with high cardinality. Here's a document on optimizing search performance that you can take a look at -
https://learn.microsoft.com/en-us/azure/search/search-performance-optimization
SearchParameters has a property called Top which instructs the Search Service to return those number of documents.
Gets or sets the number of search results to retrieve. This can be
used in conjunction with $skip to implement client-side paging of
search results. If results are truncated due to server-side paging,
the response will include a continuation token that can be used to
issue another Search request for the next page of results.
One possible solution would be to set this value to 0 in your Facets API and in that case no documents will be returned by the Search Service.
I am not sure about the performance implication of this approach though. I just tried it with a very small set of data and it worked just fine for me.

Azure Cosmos DB Python SDK : Query items from change feed using checkpoints?

Newbie to the CosmosDB...please shed some light
#Matias Quaranta - Thank you for the samples
From the official samples it seems like the Change feed can be queried either from the beginning or from a specific point in time.
options["startFromBeginning"] = True
or
options["startTime"] = time
What other options does the QueryItemsChangeFeed method support?
Does it support querying from a particular check point within a partition?
Glad the samples are useful. In theory, the concept of "checkpoints" does not exist in the Change Feed. "Checkpoints" is basically you storing the last processed batch or continuation after every execution in case your process halts.
When the process starts again, you can take your stored continuation and use it.
This is what the Change Feed Processor Library and our Azure Cosmos DB Trigger for Azure Functions do for you internally.
To pass the continuation in python, you can use options['continuation'] and you should be able to get them from the response headers on the 'x-ms-continuation'.
Refer to the sample code ReadFeedForTime, I has tried the options["startTime"]. But it doesn't work, the response is the same as the list of documents start from Beginning.

Powerapps Collection Limited To 5 Entries

I'm trying to solve an issue with Microsoft PowerApps where you are limited to storing only 5 values in a collection, I have been looking around for a while now to find a solution.
What I am essentially trying to do is create an offline issue logger from a tablet, where users will sync their devices to a database to retrieve all existing entries. They will then go offline to a customer site and take pictures and log issues offline to then sync when they get back to the office.
With this issue persisting, I cannot import more than 5 issues from the database and I cannot log more than 5 issues to then upload to the database.
I have gone through the documentation a few times now trying to find anything stating whether the storage is limited or not. I haven't been successful.
Tutorials such as : https://powerapps.microsoft.com/en-us/tutorials/show-images-text-gallery-sort-filter/ show that they are adding 5 products to work with, but that is the only mention of data in a collection.
Is anyone else experiencing the same issue? Or could anyone suggest a way around this?
Thank you
Update: The Collection view under File > Collections only shows 5 items in the table.
If you create a dropdown of the data, all the values are saved offline.
By default a collection can hold up to 500 entries, if you need more entries than this you can create code to expand the limit. If you go to File > Collections, again it only shows 5 items as an example of the data. This is replicated in the tutorials and can lead you to believe that 5 is the maximum number of items you can store.

Azure table storage access times

I have data stored in Table Storage. When I try to retrieve the data I do this using the partition key and row key. I have been doing some timings to retrieve data of around 8000 bytes.
I'm getting times ranging from 500-700ms and YES my host and storage are in the same data center.
Is Table Storage really so slow or am I doing something very wrong. I was expecting access times to be more like 50ms. Bear in mind that all of my tables added together probably only hold 200 rows.
Your performance numbers certainly sound very poor - and much worse than I've seen.
There are some useful reference numbers - and some good advice - on the storage team blog - see http://blogs.msdn.com/b/windowsazurestorage/archive/2010/11/06/how-to-get-most-out-of-windows-azure-tables.aspx
For your specific problem, I suggest writing some very simple test code to measure your numbers again - if you are still seeing the same problems, then post the code here and - if your code really is trivial - then contact MS support.
Are you trying to retrieve multiple entities at once? If so, there is a known bug in the query parser of the Table Storage, and indexes does not get used when multiple entities are queried directly from their RowKey, instead the request generate a linear scan of the table which can indeed take 500 to 700ms for each roundtrip.

Resources