Aggregation in arangodb using AQL - arangodb

I'm attempting a fairly basic task in arangodb, using the SUM() aggregate function.
Here is a working query which returns the right data (though not yet aggregated):
FOR m IN pkg_spp_RegMem
FILTER m.memberId == "40289"
COLLECT member = m.memberId INTO g
RETURN { "memberId" : member, "amount" : g[*].m[*].items }
This returns the following results:
[
{
"memberId": "40289",
"amount": [
[
{
"amount": 50,
"description": "some description"
}
],
[
{
"amount": 50,
"description": "some description"
},
{
"amount": 500,
"description": "some description"
},
{
"amount": 0,
"description": "some description"
}
],
[
{
"amount": 0,
"description": "some description"
},
]
]
}
]
I am using Collect to group the results because a given memberId may have multiple'RegMem' objects. As you can see from the query/results, each object has a list of smaller objects called 'items', with each item having an amount and a description.
I want to SUM() the amounts by member. However, adjusting the query like this does not work:
FOR m IN pkg_spp_RegMem
FILTER m.memberId == "40289"
COLLECT member = m.memberId INTO g
RETURN { "memberId" : member, "amount" : SUM(g[*].m[*].items[*].amount) }
It returns 0 because it apparently can't find a field in the expanded items list called amount.
Looking at the results I can sort of understand why: the results are being returned such that items is actually a list, of lists of objects with amount/description. But I don't understand how to reference or expand the un-named list correctly to return the amount field values for the SUM() function.
Ideally the query should return the memberId and total amount, one row per member such that I can remove the filter and execute for all members.
Many thanks in advance if you can help!
Martin
PS I've worked through the AQL tutorial on the arangodb website and checked out the manual but what would really help me is loads more example queries to look through. If anyone knows of a resource like that or wants to share some of their own, 'much obliged. Cheers!

Edited: Misread the question the first time. The first one can be seen in theedit history, as it also contains some hints:
I replicated your data by creating some documents in this format (and some with only one item):
{
"memberId": "40289",
"items": [
{
"amount": 50,
"description": "some description"
},
{
"amount": 500,
"description": "some description"
}
]
}
Based on some of those types of documents, your non-summarized query should indeed be looking like this:
FOR m IN pkg_spp_RegMem
FILTER m.memberId == "40289"
COLLECT member = m.memberId INTO g
RETURN { "memberId" : member, "amount" : g[*].m[*].items }
The data returned:
[
{
"memberId": "40289",
"amount": [
[
{
"amount": 50,
"description": "some description"
},
{
"amount": 0,
"description": "some description"
}
],
[
{
"amount": 50,
"description": "some description"
},
{
"amount": 0,
"description": "some description"
}
],
[
{
"amount": 50,
"description": "some description"
}
],
[
{
"amount": 50,
"description": "some description"
},
{
"amount": 500,
"description": "some description"
}
],
[
{
"amount": 0,
"description": "some description"
}
],
[
{
"amount": 50,
"description": "some description"
},
{
"amount": 500,
"description": "some description"
}
]
]
}
]
Based on the non summarized version, you need to loop through the items of the groups that have been generated by the collect function and do your SUM() there.
In order to be able to SUM the items you must FLATTEN() them into a single list, before summarizing them.
FOR m IN pkg_spp_RegMem
FILTER m.memberId == "40289"
COLLECT member = m.memberId INTO g
RETURN { "memberId" : member, "amount" : SUM(
FLATTEN(
(
FOR r in g[*].m[*].items
RETURN r[*].amount
)
)
)
}
This results in:
[
{
"memberId": "40289",
"amount": 1250
}
]

Related

Unable to fetch the entire column index based on the value using JSONPath finder in npm

I have the below response payload and I just want to check the amount == 1000 if it's matching then I just want to get the entire column as output.
Sample Input:
{
"sqlQuery": "select SET_UNIQUE, amt as AMOUNT from transactionTable where SET_USER_ID=11651 ",
"message": "2 rows selected",
"row": [
{
"column": [
{
"value": "22621264",
"name": "SET_UNIQUE"
},
{
"value": "1000",
"name": "AMOUNT"
}
]
},
{
"column": [
{
"value": "226064213",
"name": "SET_UNIQUE"
},
{
"value": "916",
"name": "AMOUNT"
}
]
}
]
}
Expected Output:
"column": [
{
"value": "22621264",
"name": "SET_UNIQUE"
},
{
"value": "1000",
"name": "AMOUNT"
}
]
The above sample I just want to fetch the entire column if the AMOUNT value will be 1000.
I just tried below to achieve this but no luck.
1. row[*].column[?(#.value==1000)].column
2. row[*].column[?(#.value==1000)]
I don't want to do this by using index. Because It will be change.
Any ideas please?
I think you'd need nested expressions, which isn't something that's widely supported. Something like
$.row[?(#.column[?(#.value==1000)])]
The inner expression returns matches for value==1000, then the outer expression checks for existence of those matches.
Another alternative that might work is
$.row[?(#.column[*].value==1000)]
but this assumes some implicit type conversions that may or may not be supported.

Cosmos Db: How to query for the maximum value of a property in an array of arrays?

I'm not sure how to query when using CosmosDb as I'm used to SQL. My question is about how to get the maximum value of a property in an array of arrays. I've been trying subqueries so far but apparently I don't understand very well how they work.
In an structure such as the one below, how do I query the city with more population among all states using the Data Explorer in Azure:
{
"id": 1,
"states": [
{
"name": "New York",
"cities": [
{
"name": "New York",
"population": 8500000
},
{
"name": "Hempstead",
"population": 750000
},
{
"name": "Brookhaven",
"population": 500000
}
]
},
{
"name": "California",
"cities":[
{
"name": "Los Angeles",
"population": 4000000
},
{
"name": "San Diego",
"population": 1400000
},
{
"name": "San Jose",
"population": 1000000
}
]
}
]
}
This is currently not possible as far as I know.
It would look a bit like this:
SELECT TOP 1 state.name as stateName, city.name as cityName, city.population FROM c
join state in c.states
join city in state.cities
--order by city.population desc <-- this does not work in this case
You could write a user defined function that will allow you to write the query you probably expect, similar to this: CosmosDB sort results by a value into an array
The result could look like:
SELECT c.name, udf.OnlyMaxPop(c.states) FROM c
function OnlyMaxPop(states){
function compareStates(stateA,stateB){
stateB.cities[0].poplulation - stateA.cities[0].population;
}
onlywithOneCity = states.map(s => {
maxpop = Math.max.apply(Math, s.cities.map(o => o.population));
return {
name: s.name,
cities: s.cities.filter(x => x.population === maxpop)
}
});
return onlywithOneCity.sort(compareStates)[0];
}
You would probably need to adapt the function to your exact query needs, but I am not certain what your desired result would look like.

How to extract selected key and value from nested dictionary object in a list?

I have a list example_list contains two dict objects, it looks like this:
[
{
"Meta": {
"ID": "1234567",
"XXX": "XXX"
},
"bbb": {
"ccc": {
"ddd": {
"eee": {
"fff": {
"xxxxxx": "xxxxx"
},
"www": [
{
"categories": {
"ppp": [
{
"content": {
"name": "apple",
"price": "0.111"
},
"xxx: "xxx"
}
]
},
"date": "A2020-01-01"
}
]
}
}
}
}
},
{
"Meta": {
"ID": "78945612",
"XXX": "XXX"
},
"bbb": {
"ccc": {
"ddd": {
"eee": {
"fff": {
"xxxxxx": "xxxxx"
},
"www": [
{
"categories": {
"ppp": [
{
"content": {
"name": "banana",
"price": "12.599"
},
"xxx: "xxx"
}
]
},
"date": "A2020-01-01"
}
]
}
}
}
}
}
]
now I want to filter the items and only keep "ID": "xxx" and the correspoding value for "price": "0.111", expected result can be something similar to :
[{"ID": "1234567", "price": "0.111"}, {"ID": "78945612", "price": "12.599"}]
or something like {"1234567":"0.111", "78945612":"12.599" }
Here's what I've tried:
map_list=[]
map_dict={}
for item in example_list:
#get 'ID' for each item in 'meta'
map_dict['ID'] = item['meta']['ID']
# get 'price'
data_list = item['bbb']['ccc']['ddd']['www']
for data in data_list:
for dataitem in data['categories']['ppp']
map_dict['price'] = item["content"]["price"]
map_list.append(map_dict)
print(map_list)
The result for this doesn't look right, feels like the item isn't iterating properly, it gives me result:
[{"ID": "78945612", "price": "12.599"}, {"ID": "78945612", "price": "12.599"}]
It gave me duplicated result for the second ID but where is the first ID?
Can someone take a look for me please, thanks.
Update:
From some comments from another question, I understand the reason for the output keeps been overwritten is because the key name in the dict is always the same, but I'm not sure how to fix this because the key and value needs to be extracted from different level of for loops, any help would be appreciated, thanks.
as #Scott Hunter has mentioned, you need to create a new map_dict everytime you are trying to do this. Here is a quick fix to your solution (I am sadly not able to test it right now, but it seems right to me).
map_list=[]
for item in example_list:
# get 'price'
data_list = item['bbb']['ccc']['ddd']['www']
for data in data_list:
for dataitem in data['categories']['ppp']:
map_dict={}
map_dict['ID'] = item['meta']['ID']
map_dict['price'] = item["content"]["price"]
map_list.append(map_dict)
print(map_list)
But what are you doing here is that you are basically just "forcing" your way through ... I recommend you to take a break and check out somekind of tutorial, which will help you to understand how it really works in the back-end. This is how I would have written it:
list_dicts = []
for example in example_list:
for www in item['bbb']['ccc']['ddd']['www']:
for www_item in www:
list_dicts.append({
'ID': item['meta']['ID'],
'price': www_item["content"]["price"]
})
Good luck with this problem and hope it helps :)
You need to create a new dictionary for map_dict for each ID.

How to GROUP BY in a CouchDB reduce function

I am trying to get to grips with map/reduce queries when using PouchDB/CouchDB.
I have a lot of documents in my database but I need to create a design that queries the documents and gives me all of the unique team names as a key and then tells me
a) how many unique wards are within each team
b) the total number of jobs per team (across all wards)
The structure of my data is:
{
"_id": "0448071807c0f37f53e06aab54034a42",
"_rev": "6-13fd78ada9c8833ec36a01af0acd5957",
"team": "Team A",
"ward": "Ward A",
"date": "2017-03-30",
"person": "Alice",
"bed": "Bed 001",
"jobs": [1,2,3,4]
}
{
"_id": "0448071807c0f37f53e06aab54034a42",
"_rev": "6-13fd78ada9c8833ec36a01af0acd5957",
"team": "Team A",
"ward": "Ward B",
"date": "2017-03-30",
"person": "Bob",
"bed": "Bed 001",
"jobs": [1,2]
}
{
"_id": "0448071807c0f37f53e06aab54034a42",
"_rev": "6-13fd78ada9c8833ec36a01af0acd5957",
"team": "Team A",
"ward": "Ward C",
"date": "2017-03-30",
"person": "Charles",
"bed": "Bed 001",
"jobs": [9,5]
}
{
"_id": "0448071807c0f37f53e06aab54034a42",
"_rev": "6-13fd78ada9c8833ec36a01af0acd5957",
"team": "Team B",
"ward": "Ward 00",
"date": "2017-03-30",
"person": "David",
"bed": "Bed 001",
"jobs": [1]
}
The output I would expect would be like this:
Team A
- 3 unique wards
- 8 jobs
Team B
- 1 unique ward
- 1 job
e.g.
{
"key": "Team A",
"value": {
"wards": 3,
"jobs": 8
}
}
{
"key": "Team B",
"value": {
"wards": 1,
"jobs": 1
}
}
My map is currently:
{
"all": {
"map": "function(doc) { emit(doc.team, doc) }"
}
}
It is the reduce where my struggle comes in.
EDIT
I have taken the suggestions used on CouchDB View equivalent of SUM & GROUP BY but this only goes half way towards my challenge.
If I use:
{
"all": {
"map": "function(doc) { emit([doc.team, doc.ward], 1) }",
"reduce": "function(keys, values) { return sum(values); }"
}
}
And then go to http://my-ip:5984/wardround_jobs/_design/teams/_view/all?group_level=1 then I see the unique teams (good) and the number of occurrences (also great) but I am unsure how I extend the reduce function to include the total number of jobs.
First, you have to emit the jobs length (has the number of jobs) :
function (doc) {
emit([doc.team,doc.ward],doc.jobs.length);
}
Then, you need a reduce function like this :
function (keys, values, rereduce) {
var stats = {uniq:0,jobs:0};
if (rereduce) {
for(var i=0;i<values.length;i++){
stats.uniq += values[i].uniq;
stats.jobs += values[i].jobs;
}
return stats;
}
stats.uniq = values.length;
stats.jobs = sum(values);
return stats;
}
For the first iteration, we return an object (stats) with the number of wards perm team (uniq) and the number of jobs (we sum the jobs length of every team/ward.
Then, for the rereduce, we simply aggregate the object`s values.

Marklogic 8 Node.js API - How can I scope a search on a property child of root?

[updated 17:15 on 28/09]
I'm manipulating json data of type:
[
{
"id": 1,
"title": "Sun",
"seeAlso": [
{
"id": 2,
"title": "Rain"
},
{
"id": 3,
"title": "Cloud"
}
]
},
{
"id": 2,
"title": "Rain",
"seeAlso": [
{
"id": 3,
"title": "Cloud"
}
]
},
{
"id": 3,
"title": "Cloud",
"seeAlso": [
{
"id": 1,
"title": "Sun"
}
]
},
];
After inclusion in the database, a node.js search using
db.documents.query(
q.where(
q.collection('test films'),
q.value('title','Sun')
).withOptions({categories: 'none'})
)
.result( function(results) {
console.log(JSON.stringify(results, null,2));
});
will return both the film titled 'Sun' and the films which have a seeAlso/title property (forgive the xpath syntax) = 'Sun'.
I need to find 1/ films with title = 'Sun' 2/ films with seeAlso/title = 'Sun'.
I tried a container query using q.scope() with no success; I don't find how to scope the root object node (first case) and for the second case,
q.where(q.scope(q.property('seeAlso'), q.value('title','Sun')))
returns as first result an item which matches all text inside the root object node
{
"index": 1,
"uri": "/1.json",
"path": "fn:doc(\"/1.json\")",
"score": 137216,
"confidence": 0.6202662,
"fitness": 0.6701325,
"href": "/v1/documents?uri=%2F1.json&database=Documents",
"mimetype": "application/json",
"format": "json",
"matches": [
{
"path": "fn:doc(\"/1.json\")/object-node()",
"match-text": [
"Sun Rain Cloud"
]
}
]
},
which seems crazy.
Any idea about how doing such searches on denormalized json data?
Laurent:
XPaths on JSON are supported by MarkLogic.
In particular, you might consider setting up a path range index to match /title at the root:
http://docs.marklogic.com/guide/admin/range_index#id_54948
Scoped property matching required either filtering or indexed positions to be accurate. An alternative is to set up another path range index on /seeAlso/title
For the match issue it would be useful to know the MarkLogic version and to see the entire query.
Hoping that helps,

Resources