CosmosDb query to return arrays - azure

I have the following data in my Collection
{
"id": "00000000-0000-0000-454c-4b74472b01d8",
"GroupId": 1,
"Location": "London",
"Status": "Ok"
},
{
"id": "d129adeb-d1bf-4a89-afe3-93e3f60589fb",
"GroupId": 1,
"Location": "Liverpool",
"Status": "Ok"
},
{
"id": "85ecf875-0e32-40b5-823a-a2545694f9b6",
"GroupId": 2,
"Location": "Manchester",
"Status": "Nok"
}
I need to build a query to get all possible value by Group for filtering.
Let's say for "GroupId": 1 I need result like
{
"Location": [
"London",
"Liverpool"
],
"Status": [
"Ok"
]
}
for "GroupId": 2 the response:
{
"Location": [
"Manchester",
],
"Status": [
"Nok"
]
}
Could you please help my to build such query? I don't know even if it possible with CosmosDb.
I have tried so far something like this but it doesn't work
select
(
select VALUE c.Location
FROM c
WHERE c.GroupId = 1
GROUP BY c.Location
) as Location,
(
select VALUE c.Status
FROM c
WHERE c.GroupId = 1
GROUP BY c.Status
) as Status
from c
WHERE c.GroupId = 1
and this
select
[
(SELECT VALUE [c.Location] from c)
] as Location,
[
(SELECT VALUE [c.Status] from c)
] as Status
from c
where c.GroupId = 1
Please help or suggest how to solve that. Thank you in advance.

It's not possible to do this with the way your data is modeled.
With the ARRAY expression you can do this in a subquery for arrays within your document. But not when the data spans documents as it is the case here.

Related

How to SQL query a nested list and array in a MS Azure CosmosDB?

I have the following (Azure) CosmosDB (sub) structure, that has 2 nested arrays:
{
"id": "documentTypes",
"SomeThing": "SomeThing",
"configuration": [
{
"language": "en",
"list": [
{
"id": 1,
"name": "Supporting Documents"
},
{
"id": 2,
"name": "Summary PDF"
},
]
}
],
}
I have tried the following queries, with poor results.
SELECT * FROM c WHERE c.documentTypes.configuration[0].list[0].id FROM c
and
SELECT
p.id,
p.name
FROM f
JOIN c IN f.configuration
JOIN p IN c.list
WHERE c.id == 'documentTypes'
Q: How can I get only the list of name and ids?
You need this?
SELECT ARRAY(SELECT VALUE e FROM c JOIN d IN c["configuration"] JOIN e IN d["list"]) AS Result FROM c
Output:
[
{
"Result": [
{
"id": 1,
"name": "Supporting Documents"
},
{
"id": 2,
"name": "Summary PDF"
}
]
} ]
My own solution was similar to Sajeetharan's:
SELECT list.id, list.name FROM c
JOIN configuration IN c.configuration
JOIN list IN configuration.list
WHERE c.id = 'documentTypes'
This to me look a bit simpler by not needing [""] and the ARRAY() function, and also doesn't produce the additional Result item. I have not idea if there is any performance difference.

Unable to fetch the entire column index based on the value using JSONPath finder in npm

I have the below response payload and I just want to check the amount == 1000 if it's matching then I just want to get the entire column as output.
Sample Input:
{
"sqlQuery": "select SET_UNIQUE, amt as AMOUNT from transactionTable where SET_USER_ID=11651 ",
"message": "2 rows selected",
"row": [
{
"column": [
{
"value": "22621264",
"name": "SET_UNIQUE"
},
{
"value": "1000",
"name": "AMOUNT"
}
]
},
{
"column": [
{
"value": "226064213",
"name": "SET_UNIQUE"
},
{
"value": "916",
"name": "AMOUNT"
}
]
}
]
}
Expected Output:
"column": [
{
"value": "22621264",
"name": "SET_UNIQUE"
},
{
"value": "1000",
"name": "AMOUNT"
}
]
The above sample I just want to fetch the entire column if the AMOUNT value will be 1000.
I just tried below to achieve this but no luck.
1. row[*].column[?(#.value==1000)].column
2. row[*].column[?(#.value==1000)]
I don't want to do this by using index. Because It will be change.
Any ideas please?
I think you'd need nested expressions, which isn't something that's widely supported. Something like
$.row[?(#.column[?(#.value==1000)])]
The inner expression returns matches for value==1000, then the outer expression checks for existence of those matches.
Another alternative that might work is
$.row[?(#.column[*].value==1000)]
but this assumes some implicit type conversions that may or may not be supported.

Cosmos Db: How to query for the maximum value of a property in an array of arrays?

I'm not sure how to query when using CosmosDb as I'm used to SQL. My question is about how to get the maximum value of a property in an array of arrays. I've been trying subqueries so far but apparently I don't understand very well how they work.
In an structure such as the one below, how do I query the city with more population among all states using the Data Explorer in Azure:
{
"id": 1,
"states": [
{
"name": "New York",
"cities": [
{
"name": "New York",
"population": 8500000
},
{
"name": "Hempstead",
"population": 750000
},
{
"name": "Brookhaven",
"population": 500000
}
]
},
{
"name": "California",
"cities":[
{
"name": "Los Angeles",
"population": 4000000
},
{
"name": "San Diego",
"population": 1400000
},
{
"name": "San Jose",
"population": 1000000
}
]
}
]
}
This is currently not possible as far as I know.
It would look a bit like this:
SELECT TOP 1 state.name as stateName, city.name as cityName, city.population FROM c
join state in c.states
join city in state.cities
--order by city.population desc <-- this does not work in this case
You could write a user defined function that will allow you to write the query you probably expect, similar to this: CosmosDB sort results by a value into an array
The result could look like:
SELECT c.name, udf.OnlyMaxPop(c.states) FROM c
function OnlyMaxPop(states){
function compareStates(stateA,stateB){
stateB.cities[0].poplulation - stateA.cities[0].population;
}
onlywithOneCity = states.map(s => {
maxpop = Math.max.apply(Math, s.cities.map(o => o.population));
return {
name: s.name,
cities: s.cities.filter(x => x.population === maxpop)
}
});
return onlywithOneCity.sort(compareStates)[0];
}
You would probably need to adapt the function to your exact query needs, but I am not certain what your desired result would look like.

Azure CosmosDB sql join not returning results when the child contains empty array

we have the below json structure. Having nested array of objects. Some arrays may be empty.
[
{
"adjustments": [
{
"id": "1_0000001",
"clientID": 1,
"adjustmentID": "0000001",
"chargeID": "0000001",
"dateOfEntry": "2019-01-29T00:00:00",
"adjustmentAmount": 200
}
],
"payments": [
{
"id": "1_0000001",
"clientID": 1,
"paymentID": "0000001",
"chargeID": "0000001",
"dateOfDeposit": "2019-01-28T00:00:00",
"dateOfEntry": "2019-01-29T00:00:00",
"paymentAmount": 250,
},
{
"id": "1_0000002",
"clientID": 1,
"paymentID": "0000002",
"chargeID": "0000001",
"dateOfDeposit": "2019-01-28T00:00:00",
"dateOfEntry": "2019-01-29T00:00:00",
"paymentAmount": 50,
}
],
"id": "1_0000001",
"clientID": 1,
"chargeID": "0000001",
"encounterID": "0000001",
"patientID": "1234567"
"dateOfServiceBegin": "2019-01-20T00:00:00",
"dateOfServiceEnd": "2019-01-20T00:00:00",
"dateOfEntry": "2019-01-21T00:00:00",
"location": "Main Campus",
"chargeTotal": 500
},
{
"adjustments": [],
"payments": [],
"id": "1_0000001",
"clientID": 1,
"chargeID": "0000001",
"encounterID": "0000001",
"patientID": "1234567"
"dateOfServiceBegin": "2019-02-20T00:00:00",
"dateOfServiceEnd": "2019-02-20T00:00:00",
"dateOfEntry": "2019-02-21T00:00:00",
"location": "Main Campus",
"chargeTotal": 500
}
]
i am trying to execute the below query
SELECT udf.getMonthAndYearPart(c.dateOfEntry) as date, sum(p.paymentAmount) as paymentAmount , sum(c.chargeTotal) as chargeAmount , sum(a.adjustmentAmount) as adjustmentAmount FROM c
JOIN p IN c.payments
JOIN a IN c.adjustments
where c.dateOfEntry >= '2019-01-11T18:30:00.000Z' and c.dateOfEntry <= '2020-12-30T18:30:00.000Z'
GROUP BY udf.getMonthAndYearPart(c.dateOfEntry)
I am expecting the below result
[
{
"date": "January-2019",
"paymentAmount": 300,
"chargeAmount": 1000,
"adjustmentAmount": 400
},
{
"date": "February-2019",
"chargeAmount": 500,
}
]
But I got only first object
[
{
"date": "January-2019",
"paymentAmount": 300,
"chargeAmount": 1000,
"adjustmentAmount": 400
}
]
Is there anything i can do without join? I want to calculate the sum of child objects amounts with group by month. Please help.
Found a solution by myself. using sub queries and group by. below one is the query in case anyone need this.
Select sum(k.totalPaymentAmount) as totalPaymentAmount,sum(k.totalAdjustmentAmount) as totalAdjustmentAmount,sum(k.totalCharge) as totalCharge,k.date as date From (SELECT
(SELECT value sum(c.paymentAmount) FROM c IN RevenueAnalytics.payments) as totalPaymentAmount,
(SELECT value sum(c.adjustmentAmount) FROM c IN RevenueAnalytics.adjustments) as totalAdjustmentAmount,
RevenueAnalytics.chargeTotal as totalCharge,
udf.getMonthAndYearPart(RevenueAnalytics.dateOfServiceBegin) as date
FROM RevenueAnalytics) k
Group BY k.date
In your case you would need to do a LEFT JOIN in your query to include the cases of documents with empty adjustments or payments. LEFT JOIN at the moment is not supported, you can vote this thread to include this feature. In the meanwhile you can create a procedure and do two separate queries, one as you are doing using joins, and the other not using joins and filtering (where clause) for entries with array_length 0 for adjustments and payments, and then aggregate all results and result.

Aggregation in arangodb using AQL

I'm attempting a fairly basic task in arangodb, using the SUM() aggregate function.
Here is a working query which returns the right data (though not yet aggregated):
FOR m IN pkg_spp_RegMem
FILTER m.memberId == "40289"
COLLECT member = m.memberId INTO g
RETURN { "memberId" : member, "amount" : g[*].m[*].items }
This returns the following results:
[
{
"memberId": "40289",
"amount": [
[
{
"amount": 50,
"description": "some description"
}
],
[
{
"amount": 50,
"description": "some description"
},
{
"amount": 500,
"description": "some description"
},
{
"amount": 0,
"description": "some description"
}
],
[
{
"amount": 0,
"description": "some description"
},
]
]
}
]
I am using Collect to group the results because a given memberId may have multiple'RegMem' objects. As you can see from the query/results, each object has a list of smaller objects called 'items', with each item having an amount and a description.
I want to SUM() the amounts by member. However, adjusting the query like this does not work:
FOR m IN pkg_spp_RegMem
FILTER m.memberId == "40289"
COLLECT member = m.memberId INTO g
RETURN { "memberId" : member, "amount" : SUM(g[*].m[*].items[*].amount) }
It returns 0 because it apparently can't find a field in the expanded items list called amount.
Looking at the results I can sort of understand why: the results are being returned such that items is actually a list, of lists of objects with amount/description. But I don't understand how to reference or expand the un-named list correctly to return the amount field values for the SUM() function.
Ideally the query should return the memberId and total amount, one row per member such that I can remove the filter and execute for all members.
Many thanks in advance if you can help!
Martin
PS I've worked through the AQL tutorial on the arangodb website and checked out the manual but what would really help me is loads more example queries to look through. If anyone knows of a resource like that or wants to share some of their own, 'much obliged. Cheers!
Edited: Misread the question the first time. The first one can be seen in theedit history, as it also contains some hints:
I replicated your data by creating some documents in this format (and some with only one item):
{
"memberId": "40289",
"items": [
{
"amount": 50,
"description": "some description"
},
{
"amount": 500,
"description": "some description"
}
]
}
Based on some of those types of documents, your non-summarized query should indeed be looking like this:
FOR m IN pkg_spp_RegMem
FILTER m.memberId == "40289"
COLLECT member = m.memberId INTO g
RETURN { "memberId" : member, "amount" : g[*].m[*].items }
The data returned:
[
{
"memberId": "40289",
"amount": [
[
{
"amount": 50,
"description": "some description"
},
{
"amount": 0,
"description": "some description"
}
],
[
{
"amount": 50,
"description": "some description"
},
{
"amount": 0,
"description": "some description"
}
],
[
{
"amount": 50,
"description": "some description"
}
],
[
{
"amount": 50,
"description": "some description"
},
{
"amount": 500,
"description": "some description"
}
],
[
{
"amount": 0,
"description": "some description"
}
],
[
{
"amount": 50,
"description": "some description"
},
{
"amount": 500,
"description": "some description"
}
]
]
}
]
Based on the non summarized version, you need to loop through the items of the groups that have been generated by the collect function and do your SUM() there.
In order to be able to SUM the items you must FLATTEN() them into a single list, before summarizing them.
FOR m IN pkg_spp_RegMem
FILTER m.memberId == "40289"
COLLECT member = m.memberId INTO g
RETURN { "memberId" : member, "amount" : SUM(
FLATTEN(
(
FOR r in g[*].m[*].items
RETURN r[*].amount
)
)
)
}
This results in:
[
{
"memberId": "40289",
"amount": 1250
}
]

Resources