I'm trying to query cosmosDB where would like to search the first name with SQL 'like' feature, also for case insensitivity i'm using cosmos DBs' 'lower' function.
e.g. SELECT * FROM c where contains (lower(c.lastName), "abc")) // Request Charge = 1660+
I observed Request Charge for this query is around 1660+ but when I use normal query without 'Contains' and 'lower' function the Request Charge is just '8'
e.g. SELECT * FROM c where c.lastName = "abc" // Request Charge = 8
Please help to understand what's causing the huge difference in Request Charge?
The first query with contains is effectively doing a freetext search and will have to scan all documents.
The second query is using an exact match and thus will just use an index and cost less.
And to besides the cost of the actual query path the result set of the first query could be much greater than the second as the second is always contained within the first but the first could match many more, e.g. abcd, cabcd.
If you're looking to search across a lot of text columns then it is worth looking at one of two options.
If you are searching for whole words and the columns are not too
long then you could tokenize the columns into matching fields
The more robust option would be to look at using Azure search in
combination with Cosmos DB
This advice is from the azure blog
Related
Take the public github dataset as an example
SELECT
*
FROM
`bigquery-public-data.github_repos.commits`
LIMIT
2
There are column names like
difference.old_mode
via search:
column:difference.old_mode
will show no results
So, in this case the period isn't actually the column name, its an indication that you're dealing with a complex type (there's a record/struct column of name difference, and within that exists a column named old_mode.
Per search reference there's no special syntax for complex schemas documented.
A suggestion might be to leverage a logical AND operator like column:(difference,old_mode). It's not as precise as specifying the column relationship, but it should return the results you're interested in receiving.
I have seen like a huge amount of data write to cosmos DB from stream analytics job on a particular day.
It was not supposed to write huge amount of documents in a day. I have to check if there is duplication of documents on that particular day.
Is there any query/any way to find out duplicate records in cosmos DB?
It is possible if you know the properties to check for duplicates.
We had a nasty production issue causing many duplicate records as well.
Upon contacting MS Support to help us identify the duplicate documents, they gave us the following query;
Bear in mind: property A and B together define the uniqueness in our case. So if two documents have the same value for A and B, they are duplicate.
You can then use the output of this query to, for example, delete the oldest ones but keep the recent (based on _ts)
SELECT d.A, d.B From
(SELECT c.A, c.B, count(c._ts) as counts FROM c
GROUP BY c.Discriminator, c.EndDateTime) AS d
WHERE d.counts > 1
Is there any query/any way to find out duplicate records in cosmos DB?
Quick answer is YES.Please use distinct keyword in the cosmos db query sql.And filter the _ts(System generated unix timestamp:https://learn.microsoft.com/en-us/azure/cosmos-db/databases-containers-items#properties-of-an-item)
Something like:
Select distinct c.X,c.Y,C.Z....(all columns you want to check) from c where c._ts = particular day
Then you could delete the duplicate data using this bulk delete lib:https://github.com/Azure/azure-cosmosdb-bulkexecutor-dotnet-getting-started/tree/master/BulkDeleteSample.
For example, if we have category facet and it returns withe 5 different categories, on clicking of the first category, the other categories will not be available in the response. I want to implement multiple facet search.
Appreciate your response.
For more info, i am referring the same scenario as below:
https://feedback.azure.com/forums/263029-azure-search/suggestions/7762452-provide-multiselect-facets
The facet in the response is limited to the selected and this feature is not supported. I'd suggest to vote for it here https://feedback.azure.com/forums/263029-azure-search/suggestions/7762452-provide-multiselect-facets
A workaround is to send multiple queries to get facets and filtered results separately.
For example,
1. keep all facets in the UI (or make another query to get all facets) after the first search query; 2. make another search query after another facet is selected provided that the application tracks what facets the user has selected.
if you want to filter results with multiple facets , you can modify your filter as below :
$filter = search.in(country, 'USA,Canada,Mexico,Brasil,Chile,Argentina', ',')
The first parameter to the search.in function is the string field reference (or a range variable over a string collection field in the case where search.in is used inside an any or all expression). The second parameter is a string containing the list of values, separated by spaces and/or commas. If you need to use separators other than spaces and commas because your values include those characters, you can specify an optional third parameter to search.in.
This third parameter is a string where each character of the string, or subset of this string is treated as a separator when parsing the list of values in the second parameter.
For more information about OData expression syntax for filters and order-by clauses in Azure Search, please refer to this tutorial.
I've recently run into this limitation and my workaround was to run a separate query for each facet as suggested by #rudin above.
Let's say for example that your application has facets for Colour, Brand and Size. Your primary search query includes all three filters but doesn't return any facets. You then run an additional query ignoring any selected Colours, which will give you all available colour values for the chosen brands and sizes, and you do the same for the brand and size facets.
For the additional queries it's important to set the 'Size' property to 0 so no search results are returned - just the relevant facet.
By doing this and running these queries asynchronously the performance overhead is minimal in my case with 6 facets.
I am developing an application (Python 3.x) in which I need to collect the first 13,000 results of a CSE query using one search keyword (from result index 1 to 13,000). For a free version of CSE JSON API (I have tried it), I can only get the first 10 results per query or 100 results per day (by repeating the same query while incrementing the index) otherwise it gives an error (HttpError 400.....returned Invalid Value) when the result index exceeds 100. Is there any option (paid/free) that I can deploy to achieve the objective?
Custom Search JSON API is limited to a max depth of 100 results per query, so you'll need to find a different API or devise some solution to modify the query to divide up the result set into smaller parts
if i have to pass a list of numbers got from one select statement to another select statement both working on the same collection, how will I be able to perform that ?
Query:
SELECT VALUE student.studentId
FROM student
where student.acctType="student"
This gives me a result:
["1","2","3","4"]
The query, if I had those values
SELECT student.firstName,
student.lastName
from student
where student.acctType="student" AND
student.studentId IN ("1","2","3","4")
This is what I tried but did not work :
SELECT student.firstName,
student.lastName
from student
where student.acctType="student" AND
student.studentId IN ( SELECT VALUE student1.studentId
FROM student1
where student1.acctType="student")
DocumentDB has an SQL-familiar query syntax but it's still a NoSQL database. As such, it has no support for inter-document joins and no support for sub-queries. That's the bad news.
The good news is that DocumentDB allows you to write stored procedures (sprocs) to accomplish everything you could with full SQL and much much more. You could write a sproc that did the query that returned the list of numbers, then use that list of numbers to compose another query to get your the data you need.
That said, for the simple use case you describe, you could accomplish the same thing client-side with two round trips.