We have applying aggregation and grouping, Need pagination for this.
let body = {
size: item_per_page,
"query": {
"bool": {
"must": [{
"terms": {
"log_action_master_id": action_type
}
}, {
"match": {
[search_by]: searchParams.user_id
}
}, {
"match": {
"unit_id": searchParams.unit_id
}
},
{
"range": {
[search_date]: {
gte: from,
lte: to
}
}
}
]
}
},
"aggs": {
"group": {
"terms": {
"field": "id",
"size": item_per_page,
"order": { "_key": sortdirction }
},
},
"types_count": {
"value_count": {
"field": "id.keyword"
}
},
},
};
You can use below options:-
Composite Aggregation: can combine multiple datasources in a single buckets and allow pagination and sorting on it. It can only paginate linearly using after_key i.e you cannot jump from page 1 to page 3. You can fetch "n" records , then pass returned after key and fetch next "n" records.
GET index22/_search
{
"size": 0,
"aggs": {
"ValueCount": {
"value_count": {
"field": "id.keyword"
}
},
"pagination": {
"composite": {
"size": 2,
"sources": [
{
"TradeRef": {
"terms": {
"field": "id.keyword"
}
}
}
]
}
}
}
}
Include partition: group's the field’s values into a number of partitions at query-time and processing only one partition in each request. Term fields are evenly distributed in different partitions. So you must know number of terms beforehand. You can use cardinality aggregation to get count
GET index22/_search
{
"size": 0,
"aggs": {
"TradeRef": {
"terms": {
"field": "id.keyword",
"include": {
"partition": 0,
"num_partitions": 3
}
}
}
}
}
Bucket Sort aggregation : sorts the buckets of its parents multi bucket aggreation. Each bucket may be sorted based on its _key, _count or its sub-aggregations. It only applies to buckets returned from parent aggregation. You will need to set term size to 10,000(max value) and truncate buckets in bucket_sort. You can paginate using from and size just like in query. If you have terms more that 10,000 you won't be able to use it since it only selects from buckets returned by term.
GET index22/_search
{
"size": 0,
"aggs": {
"valueCount":{
"value_count": {
"field": "TradeRef.keyword"
}
},
"TradeRef": {
"terms": {
"field": "TradeRef.keyword",
"size": 10000
},
"aggs": {
"my_bucket": {
"bucket_sort": {
"sort": [
{
"_key": {
"order": "asc"
}
}
],
"from": 2,
"size": 1
}
}
}
}
}
}
In terms of performance composite aggregation is a better choice
Related
Currently building the following Elasticsearch 6.8 query\aggregation:
{
"sort": [
{
"DateCreated": {
"order": "desc"
}
}
],
"query": {
"bool": {
"must": [
{
"match": {
"InternalEntityId": "ExampleValue1111"
}
},
{
"match": {
"Direction": "Inbound"
}
}
]
}
},
"aggs": {
"top_ext": {
"terms": {
"field": "ExternalAddress.keyword"
},
"aggs": {
"top_date": {
"top_hits": {
"sort": [
{
"DateCreated": {
"order": "desc"
}
}
],
"size": 1
}
}
}
}
}
}
How do we perform (in the same search):
Count the sum of (hits per bucket) that have no value (must_not exists style query) PER bucket
Ideally, with the return of the top_ext agg return.. each bucket would have a count of the records that have no value.
Thanks!
Now you can do two things here,
1. Either sort the "top_ext" terms agg bucket by asc order of doc count and you can use the top n zero size buckets here
2. You can apply a bucket selector aggregation in parallel to you inner hits so that only those inner hits will appear that have zero docCounts.
Here is a query dsl that uses both the above approaches.(You can plug in all other required elements of the query, I have focused mainly on the aggregation part here)
GET kibana_sample_data_ecommerce/_search
{
"size": 0,
"aggs": {
"outer": {
"terms": {
"field": "products.category.keyword",
"size": 10,
"order": {
"_count": "asc"
}
},
"aggs": {
"inner": {
"top_hits": {
"size": 10
}
},
"restrictedBuckets": {
"bucket_selector": {
"buckets_path": {
"docCount": "_count"
},
"script": "params.docCount<1"
}
}
}
}
}
}
I have tried to integrate group by with elastic search. But I didn't get the answer properly. Please support me to fix this issue. Indexed data is,
data = [
{ "fruit":"apple", "taste":5, "timestamp":100},
{ "fruit":"pear", "taste":5, "timestamp":110},
{ "fruit":"apple", "taste":4, "timestamp":200},
{ "fruit":"pear", "taste":8, "timestamp":90},
{ "fruit":"banana", "taste":5, "timestamp":100}]`
My query is,
`myQuery = {"query": {
"match_all": {}
},
"aggs": {
"group_by_fruit": {
"terms": {
"field": "fruit.keyword"
},
}
}
}
It showing all 5 data in the output. Actually I nee d to get only 3 records. The expected result is,
[
{ "fruit":"apple", "taste":4, "timestamp":200},
{ "fruit":"pear", "taste":8, "timestamp":90},
{ "fruit":"banana", "taste":5, "timestamp":100}]
If you want to get the documents with distinct fruit fields having the largest timestamp value you should use a top_hits aggregation.
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"top_tags": {
"terms": {
"field": "fruit.keyword",
"size": <MAX_NUMBER_OF_DISTINCT_FRUITS>
},
"aggs": {
"group_by_fruit": {
"top_hits": {
"sort": [
{
"timestamp": {
"order": "desc"
}
}
],
"size" : 1
}
}
}
}
}
}
We are trying to get the frequency count for search terms using aggregation. Since there are three keys for which we need to get the frequency count facing performance degrade with respect to search. How to get frequency count without aggregation? Please suggest some alternate approach.
Query:
{
"aggs": {
"name_exct": {
"filter": {
"term": {
"name_exct": "test"
}
},
"aggs": {
"name_exct_count": {
"terms": {
"field": "name_exct"
}
}
}
},
"CITY": {
"filter": {
"term": {
"CITY": "US"
},
"aggs": {
"CITY_count": {
"terms": {
"field": "CITY"
}
}
}
}
}
}
Currently i am trying to group a field based on one field and than getting sum of other fields with respect to the respective field used for grouping. I want to get a new value which needs to be division of the summed field . I will provide the current query i have :
In my query i am aggregating them based on the field ("a_name") and summing "spend" and "gain". I want to get a new field which would be ratio of sum (spend/gain)
I tried adding script but i am getting NaN , also to enable this; i had to enable them first in elasticsearch.yml file
script.engine.groovy.inline.aggs: on
Query
GET /index1/table1/_search
{
"size": 0,
"query": {
"filtered": {
"query": {
"query_string": {
"query": "*",
"analyze_wildcard": true
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"account_id": 29
}
}
],
"must_not": []
}
}
}
},
"aggs": {
"custom_name": {
"terms": {
"field": "a_name"
},
"aggs": {
"spe": {
"sum": {
"field": "spend"
}
},
"gained": {
"sum": {
"field": "gain"
}
},
"rati": {
"sum": {
"script": "doc['spend'].value/doc['gain'].value"
}
}
}
}
}
}
This particular query is showing me a 'NaN' in output. If I replace the division to multiplication the query works.
Essentially what i am looking for is to divide my two aggregators "spe" and "gained"
Thanks!
It might be possible that doc.gain is 0 in some of your documents. You may try changing the script to this instead:
"script": "doc['gain'].value != 0 ? doc['spend'].value / doc['gain'].value : 0"
UPDATE
If you want to compute the ratio of the result of two other metric aggregations, you can do so using a bucket_script aggregation (only available in ES 2.0, though).
{
...
"aggs": {
"custom_name": {
"terms": {
"field": "a_name"
},
"aggs": {
"spe": {
"sum": {
"field": "spend"
}
},
"gained": {
"sum": {
"field": "gain"
}
},
"bucket_script": {
"buckets_paths": {
"totalSpent": "spe",
"totalGained": "gained"
},
"script": "totalSpent / totalGained"
}
}
}
}
}
I have a query like so:
{
"sort": [
{
"_geo_distance": {
"geo": {
"lat": 39.802763999999996,
"lon": -105.08748399999999
},
"order": "asc",
"unit": "mi",
"mode": "min",
"distance_type": "sloppy_arc"
}
}
],
"query": {
"bool": {
"minimum_number_should_match": 0,
"should": [
{
"match": {
"name": ""
}
},
{
"match": {
"credit": true
}
}
]
}
}
}
I want my search to always return ALL results, just sorted with those which have matching flags closer to the top.
I would like the sorting priority to go something like:
searchTerm (name, a string)
flags (credit/atm/ada/etc, boolean values)
distance
How can this be achieved?
So far, the query you see above is all I've gotten. I haven't been able to figure out how to always return all results, nor how to incorporate the additional queries into the sort.
I don't believe "sort" is the answer you are looking for, actually. I believe you need a trial-and-error approach starting with a simple "bool" query where you put all your criterias (name, flags, distance). Then you give your name criteria more weight (boost) then a little bit less to your flags and even less to the distance calculation.
A "bool" "should" would be able to give you a sorted list of documents based on the _score of each and, depending on how you score each criteria, the _score is being influenced more or less.
Also, returning ALL the elements is not difficult: just add a "match_all": {} to your "bool" "should" query.
This would be a starting point, from my point of view, and, depending on your documents and your requirements (see my comment to your post about the confusion) you would need to adjust the "boost" values and test, adjust again and test again etc:
{
"query": {
"bool": {
"should": [
{ "constant_score": {
"boost": 6,
"query": {
"match": { "name": { "query": "something" } }
}
}},
{ "constant_score": {
"boost": 3,
"query": {
"match": { "credit": { "query": true } }
}
}},
{ "constant_score": {
"boost": 3,
"query": {
"match": { "atm": { "query": false } }
}
}},
{ "constant_score": {
"boost": 3,
"query": {
"match": { "ada": { "query": true } }
}
}},
{ "constant_score": {
"query": {
"function_score": {
"functions": [
{
"gauss": {
"geo": {
"origin": {
"lat": 39.802763999999996,
"lon": -105.08748399999999
},
"offset": "2km",
"scale": "3km"
}
}
}
]
}
}
}
},
{
"match_all": {}
}
]
}
}
}