I have a usecase where for a given result value i want to reverse lookup all the search conditions defined that will give this as result.
So, I have a set of search conditions defined in a table as key value list. Each entry in this table is a search query. Now, I have a random value in dataset which can be result of any search entries defined in the table. I want to lookup that table so that for this value i can get all the search queries possible where this value would appear as its result.
The search table consist of fields search_conditions, search_table among other fields.
Schema would be like
Search_Table
id (long)
search_table_id (long)
search_conditions (json array as text)
This is value of one such search condition
[
{
"key": "name",
"operator": "equals",
"value": "jeff"
},
{
"key": "age",
"operator": "between",
"value": [
20,
40
]
}
]
Value that i have to search can be say a random user {"name": "mr x", "age":12}.
This may not be exactly a technology based question but its solution may require technology. Any help will be appreciated. The concern is more about optimization as this has to be done in real time.
Related
Assume I have the following Cosmos DB container with the possible doc type partitions:
{
"id": <string>,
"partitionKey": <string>, // Always "item"
"name": <string>
}
{
"id": <string>,
"partitionKey": <string>, // Always "group"
"items": <array[string]> // Always an array of ids for items in the "item" partition
}
I have the id of a "group" document, but I do not have the document itself. What I would like to do is perform a query which gives me all "item" documents referenced by the "group" document.
I know I can perform two queries: 1) Retrieve the "group" document, 2) Perform a query with IN clause on the "item" partition.
As I don't care about the "group" document other than getting the list of ids, is it possible to construct a single query to get me all the "item" documents I want with just the "group" document id?
You'll need to perform two queries, as there are no joins between separate documents. Even though there is support for subqueries, only correlated subqueries are currently supported (meaning, the inner subquery is referencing values from the outer query). Non-correlated subqueries are what you'd need.
Note that, even though you don't want all of the group document, you don't need to retrieve the entire document. You can project just the items property, which can then be used in your 2nd query, with something like array_contains(). Something like:
SELECT VALUE g.items
FROM g
WHERE g.id="1"
AND g.partitionKey="group"
SELECT VALUE i.name
FROM i
WHERE array_contains(<items-from-prior-query>,i.id)
AND i.partitionKey="item"
This documentation page clarifies the two subquery types and support for only correlated subqueries.
I need a query that can get me the document from a list of words for example if I use
select c from c join (SELECT distinct VALUE c.id FROM c JOIN word IN c.words WHERE word in('word1',word2) and tag in('motorcycle')) ORDER BY c._ts desc
it will bring both documents, I want to retrieve only the first one cause it matches the two words and not only one.
Document 1
"c": {
"id": "d0f1723c-0a55-454a-9cf8-3884f2d8d61a",
"words": [
"word1",
"word2",
"word3",
]}
Document 2
"c": {
"id": "d0f1723c-0a55-454a-9cf8-3884f2d8d61a",
"words": [
"word1",
"word4",
"word5",
]}
You should be able to cover this with two ARRAY_CONTAINS expressions in your WHERE clause (and no need for a JOIN):
SELECT c.id FROM c
WHERE ARRAY_CONTAINS(c.words, 'word1')
AND ARRAY_CONTAINS(c.words, 'word2')
This should return the id of your first document.
This query cost 265 RU/s:
SELECT top 1 * FROM c
WHERE c.CollectPackageId = 'd0613cbb-492b-4464-b66b-3634b5571826'
ORDER BY c.StartFetchDateTimeUtc DESC
StartFetchDateTimeUtc is a string property, serialized by using the Cosmos API
This query cost 5 RU/s:
SELECT top 1 * FROM c
WHERE c.CollectPackageId = 'd0613cbb-492b-4464-b66b-3634b5571826'
ORDER BY c._ts DESC
_ts is a built in field, a Unix-based numeric timestamp.
Example result (only including this field and _ts):
"StartFetchDateTimeUtc": "2017-08-08T03:35:04.1654152Z",
"_ts": 1502163306
The index is in place and follows the suggestions & tutorials how to configure a sortable string/timestamp. It looks like:
{
"path": "/StartFetchDateTimeUtc/?",
"indexes": [
{
"kind": "Range",
"dataType": "String",
"precision": -1
}
]
}
According to this article, the "Item size,Item property count,Data consistency,Indexed properties,Document indexing,Query patterns,Script usage" variables will affect the RU.
So it is very strange that different property costs different RU.
I also create a test demo on my side(with your index and same document property). I have inserted 1000 records to the documentdb. The two different query costs same RU. I suggest you could start a new collection and test again.
The result is like this:
Order by StartFetchDateTimeUtc
Order by _ts
I am trying to extract a specific value from an array property in the Stream Analytics query language.
My data looks as follows:
"context": {
"custom": {
"dimensions": [{
"MacAddress": "ma"
},
{
"IpAddress": "ipaddr"
}]
}
}
I am trying to obtain a result that has "MacAddress", "IpAddress" as column titles and "ma", "ipaddr" as rows.
I am currently achieving this with this query:
SELECT
GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 0), 'MacAddress') AS MacAddress,
GetRecordPropertyValue(GetArrayElement(MySource.context.custom.dimensions, 1), 'IpAddress') AS IpAddress,
I am trying to use CROSS APPLY but so far no luck. Below the CROSS APPLY query:
SELECT
flat.ArrayValue.MacAddress as MacAddress,
flat.ArrayValue.IpAddress as IpAddress
FROM
[ffapi-track-events] as MySource
CROSS APPLY GetArrayElements(MySource.context.custom.dimensions) as flat
This one produces two rows instead of one:
MacAddress, IpAddress
ma ,
, ipaddr
so I'm missing precisely the flattening when writing it like that.
I would like to bypass hardcoding the index 0 as it's not guaranteed that MacAddress won't switch places with "IpAddress"... So I need something like FindElementInArray by condition, or some means to join with the dimensions array.
Is there such thing?
Thank you.
I have stored the following JSON document in the Azure Document DB:
"JobId": "04e63d1d-2af1-42af-a349-810f55817602",
"JobType": 3,
"
"Properties": [
{
"Key": "Value1",
"Value": "testing1"
},
{
"Key": "Value",
"Value": "testing2"
}
]
When i try to query the document back i can easily perform the
Select f.id,f.Properties, C.Key from f Join C IN f.Properties where C.Key = 'Value1'
However when i try to query:
Select f.id,f.Properties, C.Key from f Join C IN f.Properties where C.Value = 'testing1'
I get an error that the query cannot be computed. I assume this is due to 'VALUE' being a reserved keyword within the query language.
I cannot specify a specific order in the property array because different subclasses can add different property in different orders as they need them.
Anybody any suggestion how i can still complete this query ?
To escape keywords in DocumentDB, you can use the [] syntax. For example, the above query would be:
Select f.id,f.Properties, C.Key from f Join C IN f.Properties where C["Value"] = 'testing1'