Finding sum of average sub aggregations - search

I'd like to get the sum of a sub aggregation. For example, I'm grouping by smartphones, then by carrier, and then I'm finding the average price of each carrier for that particular smartphone. I'd like to get the sum of the average prices for all carriers for each smartphone. So essentially, I want something like this:
{
"aggs": {
"group_by_smartphones": {
"terms": {
"field": "smartphone",
"order": {
"_term": "asc"
},
"size": 200
},
"aggs": {
"group_by_carrier": {
"terms": {
"field": "carrier",
"order": {
"group_by_avg": "desc"
}
},
"aggs": {
"group_by_avg": {
"avg": {
"field": "price"
}
}
}
},
"group_by_sum": {
"sum_bucket": {
"field": "group_by_smartphones>group_by_carrier>group_by_avg"
}
}
}
}
}
}
When I try doing this query, I get an error saying:
"type": "parsing_exception", "reason": "Unexpected token VALUE_STRING
[field] in [group_by_sum]",
So essentially I want to get the sum of the averages of all carriers for a particular smartphone. Is there a way to get this?

Your group_by_sum aggregation needs to be written like this:
"group_by_sum": {
"sum_bucket": {
"buckets_path": "group_by_carrier>group_by_avg"
}
}

Related

How to perform sub aggregation that will calculate fields with no value per bucket?

Currently building the following Elasticsearch 6.8 query\aggregation:
{
"sort": [
{
"DateCreated": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"match": {
"InternalEntityId": "ExampleValue1111"
}
},
{
"match": {
"Direction": "Inbound"
}
}
]
}
},
"aggs": {
"top_ext": {
"terms": {
"field": "ExternalAddress.keyword"
},
"aggs": {
"top_date": {
"top_hits": {
"sort": [
{
"DateCreated": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
}
}
How do we perform (in the same search):
Count the sum of (hits per bucket) that have no value (must_not exists style query) PER bucket
Ideally, with the return of the top_ext agg return.. each bucket would have a count of the records that have no value.
Thanks!
Now you can do two things here,
1. Either sort the "top_ext" terms agg bucket by asc order of doc count and you can use the top n zero size buckets here
2. You can apply a bucket selector aggregation in parallel to you inner hits so that only those inner hits will appear that have zero docCounts.
Here is a query dsl that uses both the above approaches.(You can plug in all other required elements of the query, I have focused mainly on the aggregation part here)
GET kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"outer": {
"terms": {
"field": "products.category.keyword",
"size": 10,
"order": {
"_count": "asc"
}
},
"aggs": {
"inner": {
"top_hits": {
"size": 10
}
},
"restrictedBuckets": {
"bucket_selector": {
"buckets_path": {
"docCount": "_count"
},
"script": "params.docCount<1"
}
}
}
}
}
}

Is there a Group BY function for finding result with elastic search query?

I have tried to integrate group by with elastic search. But I didn't get the answer properly. Please support me to fix this issue. Indexed data is,
data = [
{ "fruit":"apple", "taste":5, "timestamp":100},
{ "fruit":"pear", "taste":5, "timestamp":110},
{ "fruit":"apple", "taste":4, "timestamp":200},
{ "fruit":"pear", "taste":8, "timestamp":90},
{ "fruit":"banana", "taste":5, "timestamp":100}]`
My query is,
`myQuery = {"query": {
"match_all": {}
},
"aggs": {
"group_by_fruit": {
"terms": {
"field": "fruit.keyword"
},
}
}
}
It showing all 5 data in the output. Actually I nee d to get only 3 records. The expected result is,
[
{ "fruit":"apple", "taste":4, "timestamp":200},
{ "fruit":"pear", "taste":8, "timestamp":90},
{ "fruit":"banana", "taste":5, "timestamp":100}]
If you want to get the documents with distinct fruit fields having the largest timestamp value you should use a top_hits aggregation.
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"top_tags": {
"terms": {
"field": "fruit.keyword",
"size": <MAX_NUMBER_OF_DISTINCT_FRUITS>
},
"aggs": {
"group_by_fruit": {
"top_hits": {
"sort": [
{
"timestamp": {
"order": "desc"
}
}
],
"size" : 1
}
}
}
}
}
}

Getting sum sub aggregation

I'd like to get the sum of a sub aggregation. For example, I have group by smartphones, group by carrier and then the average price for that carrier. I'd like to get the sum of all prices for all carriers for a specific smartphone. So essentially, I want something like this:
{
"aggs": {
"group_by_smartphones": {
"terms": {
"field": "smartphone",
"order": {
"_term": "asc"
},
"size": 200
},
"aggs": {
"group_by_sum": {
"sum": {
"field": "price"
},
"aggs": {
"group_by_carrier": {
"terms": {
"field": "carrier",
"order": {
"group_by_avg": "desc"
}
},
"aggs": {
"group_by_avg": {
"avg": {
"field": "price"
}
}
}
}
}
}
}
}
}
}
Except, when I do it like this I get this error:
"type": "aggregation_initialization_exception",
"reason": "Aggregator [group_by_sum] of type [sum] cannot accept sub-aggregations"
How do I fix it so I can get the sum of all prices for each smartphone?
You're almost there, actually the sum and group_by_carrier sub-aggregations both need to be at the same level:
{
"aggs": {
"group_by_smartphones": {
"terms": {
"field": "smartphone",
"order": {
"_term": "asc"
},
"size": 200
},
"aggs": {
"sum_prices": {
"sum": {
"field": "price"
}
},
"group_by_carrier": {
"terms": {
"field": "carrier",
"order": {
"group_by_avg": "desc"
}
},
"aggs": {
"group_by_avg": {
"avg": {
"field": "price"
}
}
}
}
}
}
}
}

Elasticsearch : Alternate approach to get frequency count without using aggregation

We are trying to get the frequency count for search terms using aggregation. Since there are three keys for which we need to get the frequency count facing performance degrade with respect to search. How to get frequency count without aggregation? Please suggest some alternate approach.
Query:
{
"aggs": {
"name_exct": {
"filter": {
"term": {
"name_exct": "test"
}
},
"aggs": {
"name_exct_count": {
"terms": {
"field": "name_exct"
}
}
}
},
"CITY": {
"filter": {
"term": {
"CITY": "US"
},
"aggs": {
"CITY_count": {
"terms": {
"field": "CITY"
}
}
}
}
}
}

Division of two fields in Elasticsearch

Currently i am trying to group a field based on one field and than getting sum of other fields with respect to the respective field used for grouping. I want to get a new value which needs to be division of the summed field . I will provide the current query i have :
In my query i am aggregating them based on the field ("a_name") and summing "spend" and "gain". I want to get a new field which would be ratio of sum (spend/gain)
I tried adding script but i am getting NaN , also to enable this; i had to enable them first in elasticsearch.yml file
script.engine.groovy.inline.aggs: on
Query
GET /index1/table1/_search
{
"size": 0,
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*",
"analyze_wildcard": true
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"account_id": 29
}
}
],
"must_not": []
}
}
}
},
"aggs": {
"custom_name": {
"terms": {
"field": "a_name"
},
"aggs": {
"spe": {
"sum": {
"field": "spend"
}
},
"gained": {
"sum": {
"field": "gain"
}
},
"rati": {
"sum": {
"script": "doc['spend'].value/doc['gain'].value"
}
}
}
}
}
}
This particular query is showing me a 'NaN' in output. If I replace the division to multiplication the query works.
Essentially what i am looking for is to divide my two aggregators "spe" and "gained"
Thanks!
It might be possible that doc.gain is 0 in some of your documents. You may try changing the script to this instead:
"script": "doc['gain'].value != 0 ? doc['spend'].value / doc['gain'].value : 0"
UPDATE
If you want to compute the ratio of the result of two other metric aggregations, you can do so using a bucket_script aggregation (only available in ES 2.0, though).
{
...
"aggs": {
"custom_name": {
"terms": {
"field": "a_name"
},
"aggs": {
"spe": {
"sum": {
"field": "spend"
}
},
"gained": {
"sum": {
"field": "gain"
}
},
"bucket_script": {
"buckets_paths": {
"totalSpent": "spe",
"totalGained": "gained"
},
"script": "totalSpent / totalGained"
}
}
}
}
}

Resources