Query cosmosDB: get last element in array - azure

i have document like this:
{ "id": ....,
"Title": ""title,
"ZipCodes": [
{
"Code": "code01",
"Name": "Name01"
},
{
"Code": "code02",
"Name": "Name02"
},
{
"Code": "code03",
"Name": "Name03"
} ],
"_rid": .......,
"_self": .......,
"_etag": ......,
"_attachments": "attachments/",
"_ts": ......
i was used to command
select c.id, c.ZipCodes[ARRAY_LENGTH (c.ZipCodes) -1] as ZipCodes from c
But i got error, how can i query last element ZipCodes in cosmos DB.

You can use ARRAY_SLICE for this. When passed -1 it returns an array containing the last element of the original array. Then index into that with [0] to get the single element contained (i.e. the zip code itself.)
SELECT c.id,
ARRAY_SLICE(c.ZipCodes,-1)[0] AS LastZipCode
FROM c

There is no way using select you can query the subdocument , i think you should use the where condition as follows,
SELECT value udf.sortZipCode(c.ZipCodes)
from c where c.id=2 and c.Title='title'
However, here is a user defined function (UDF) that will do the trick:
function sortZipCode(ZipCode) {
function compareTimeStamps(a, b) {
return a.TimeStamp - b.TimeStamp; //implement your logic
}
return scanLog.sort(compareTimeStamps);
}

But i got error, how can i query last element ZipCodes in cosmos DB.
I agree with Sajeetharan mentioned that we could use the UDF to do that. And we could do that with UDF easily.
UDF code
function userDefinedFunction(zipcodes){
return zipcodes[zipcodes.length-1];
}
SQL query:
SELECT c.id,c.Title,udf.GetLastRecord(c.ZipCodes) as ZipCodes FROM c
Test Result:

Related

Cosmos Db (need any sort of iteration mechanism)

want to check my document have same value in object A for eg:
{
"id": "1234-wrew-1234314"
"_ts": 1672840679
"A": [
{
"Id": "123",
"values": 167273168512
},
{
"Id": "1234",
"values": 1672731685
},
{
"Id": "123456",
"values": 1673461685
}
]
}
have this document now i want to check all values have same value or not is there any way to do this?
what i already tried :
select EXISTS(
SELECT VALUE n
FROM n IN c.A
WHERE c.A[0].values= c.A[1].values) as a
from c
where c.id ="1234-wrew-1234314"
its working fine if i have only 2 records in object A but i want generic solution to handle any number of records in object.
i also try with array_contain but its not working.
Thanks in advance.
Does this do what you need?
SELECT d.MaxValue = d.MinValue ? 'All the same' : 'Not the same'
FROM
(
SELECT MAX(a.values) AS MaxValue ,
MIN(a.values) AS MinValue
FROM c
JOIN a IN c.A
WHERE c.id = "1234-wrew-1234314"
) d

Moving specific Index Data into a new Index within Elasticsearch

I have several million docs, that I need to move into a new index, but there is a condition on which docs should flow into the index. Say I have a field named, offsets, that needs to be queried against. The values I need to query for are: [1,7,99,32, ....., 10000432] (very large list) in the offset field..
Does anyone have thoughts on how I can move the specific docs, with those values in the list into a new elasticsearch index.? My first though was reindexing with a query, but there is no pattern for the offsets list..
Would it be a python loop appending each doc to a new index? Looking for any guidance.
Thanks
Are the documents really large, or can you add them into an jsonl file for bulk ingestion?
In what form is the selector list, the one shown as "[1,7,99,32, ....., 10000432]"?
I'd do it in Pandas, but here is an idea in ES parlance.
Whatever you do, do use the _bulk API, or the job will never finish.
You can run a query based upon as file as per
GET my_index/_search?_file="myquery_file"
You can put all the ids into a file, myquery_file, as below:
{
"query": {
"ids" : {
"values" : ["1", "4", "100"]
}
},
"format": "jsonl"
}
and output as jsonl to ingest.
You can do the above for the reindex API.
{
"source": {
"index": "source",
**"query": {
"match": {
"company": "cat"
}
}**
},
"dest": {
"index": "dest",
"routing": "=cat"
}
}
Unfortunately,
I was facing a time crunch, and had to throw in a personalized loop to query a very specific subset of indices..
df = pd.read_csv('C://code//part_1_final.csv')
offsets = df['OFFSET'].tolist()
# Offsets are the "unique" values I need to identify the docs by.. There is no pattern in these values, thus I must go one by one..
missedDocs = []
for i in offsets:
print(i)
try:
client.reindex({
"source": {
"index": "<source_index>,
"query": {
"bool": {
"must": [
{ "match" : {"<index_filed_1>": "1" }},
{ "match" : {"<index_with_that_needs_values_to_match": i }}
]
}
}
},
"dest": {
"index": "<dest_index>"
}
})
except KeyError:
print('error')
#missedDocs.append(query)
print('DOC ERROR')

How can I obtain a document from a Cosmos DB using a field in an array as a filter?

I have a Cosmos DB with documents that look like the following:
{
"name": {
"productName": "someProductName"
},
"identifiers": [
{
"identifierCode": "1234",
"identifierLabel": "someLabel1"
},
{
"identifierCode": "432",
"identifierLabel": "someLabel2"
}
]
}
I would like to write a sql query to obtain an entire document using "identifierLabel" as a filter when searching for the document.
I attempted to write a query based on an example I found from the following blog:
SELECT c,t AS identifiers
FROM c
JOIN t in c.identifiers
WHERE t.identifierLabel = "someLabel2"
However, when the result is returned, it appends the following to the end of the document:
{
"name": {
"productName": "someProductName"
},
"identifiers": [
{
"identifierCode": "1234",
"identifierLabel": "someLabel1"
},
{
"identifierCode": "432",
"identifierLabel": "someLabel2"
}
]
},
{
"identifierCode": "432",
"identifierLabel": "someLabel2"
}
How can I avoid this and get the result that I desire, i.e. the entire document with nothing appended to it?
Thanks in advance.
Using ARRAY_CONTAINS(), you should be able to do something like this to retrieve the entire document, without any need for a self-join:
SELECT *
FROM c
where ARRAY_CONTAINS(c.identifiers, {"identifierLabel":"someLabel2"}, true)
Note that ARRAY_CONTAINS() can search for either scalar values or objects. By specifying true as the third parameter, it signifies searching through objects. So, in the above query, it's searching all objects in the array where identifierLabel is set to "someLabel2" (and then it should be returning the original document, unchanged, avoiding the issue you ran into with the self-join).

Azure CosmosDB (SQL) - How to query an for a set of objects with a list to return when any element in the object's list matches condition

From the database we need to return all objects who have a closedDate within a date range. The CloseDate property is on a child object contained in the list within the object. I want to return the object if any ClosedDate within that list is within the date range. Currently i'm only able to construct a Cosmos query which returns the object when All CloseDates are in the range but I need to return when Any are in the range.
Current Query
IQueryable<ServiceRepairOrder> query = this.Client.CreateDocumentQuery<ServiceRepairOrder>(UriFactory.CreateDocumentCollectionUri(DatabaseName, ContainerName()), queryOptions)
.Where(ro => ro.AccountId == this.AccountID)
.Where(ro => ro.Items.Any(li => li.ClosedDate >= start && li.ClosedDate <= end) );
Object JSON Example
{
"id": "45144",
"Type": "ServiceRepairOrder",
"AccountID": "account1",
"Items": [
{
"ClosedDate": "someDateInRange",
"Id": "itemId1",
"Key": "value1"
},
{
"ClosedDate": "someDateOutOfRange",
"Id": "itemId2",
"Key": "value2"
}
]
}
Can this help you?
SELECT distinct c.id FROM c JOIN t IN c.Items where t.ClosedDate>="2020-11-1" and t.ClosedDate<="2020-11-30"

CosmosDB sort results by a value into an array

I've some CosmosDB documents like the following
{
"ProductId": 1,
"Status": true,
"Code": "123456",
"IsRecall": false,
"ScanLog": [
{
"Location": {
"type": "Point",
"coordinates": [
13.5957758,
42.7111538
]
},
"TimeStamp": 201602160957190600,
"ScanType": 0,
"UserId": "1004"
},
{
"Location": {
"type": "Point",
"coordinates": [
13.5957907,
42.7111359
]
},
"TimeStamp": 201602161246336640,
"ScanType": 0,
"UserId": "1004"
}
]
}
How can I order the query results by the TimeStamp property? I've tried using this query
SELECT c.Code, b.TimeStamp FROM c JOIN b IN c.ScanLog ORDER BY b.TimeStamp
but I receive this error
Order-by over correlated collections is not supported.
What is the correct way to do this?
JOINs with ORDER BY are currently not supported.
However, here is a user defined function (UDF) that will do the trick:
function sortScanLog (scanLog) {
function compareTimeStamps(a, b) {
return a.TimeStamp - b.TimeStamp;
}
return scanLog.sort(compareTimeStamps);
}
You use with a query like this:
SELECT c.ProductId, udf.sortScanLog(c.ScanLog) as ScanLog FROM c
If you want the opposite sort order, simply swap the a and b. So, the signature of the compareTimeStamps inner function would be:
function compareTimeStamps(b, a)
Alternatively, you can sort client-side after the results are returned.
Right now, ORDER BY clauses mixed with JOINs are not supported, the engine can look at indexed properties for JOIN operations but cannot re-order results based on the JOIN result.
You'd have to go with something like Larry offered or do the JOIN on the Query and the Sort by your own code once the results arrive, if you use C#, you can sort them with Linq for example.

Resources