Aggregation on an array of objects

Aggregation on an array of objects - node.js

I have the following data in my elastic.
{
...someData,
languages: [
{
language:{_id: 1, name:"English"}
},
{
language:{_id: 2, name:"Arabic"}
}
]
}
But when I aggregate the data using this query
aggs: {
languages: {
terms: {
field: "languages.language._id.keyword",
size: 50
},
aggs: {
value: {
terms: {
field: "languages.language.name.keyword"
}
}
}
}
}
I will get the English id with 2 buckets for Arabic and English
and same for Arabic id, because technically its included there.
Is there a way to return only the count of the object I need?
Thanks

You need to define languages field as nested for applying aggregation on individual element of array.
Configured Nested field:
PUT index0
{
"mappings": {
"properties": {
"languages":{
"type": "nested"
}
}
}
}
Sample document index:
POST index0/_doc
{
"languages": [
{
"language": {
"_id": 1,
"name": "English"
}
},
{
"language": {
"_id": 2,
"name": "Arabic"
}
}
]
}
Sample Aggregation Query:
{
"size": 0,
"aggs": {
"languages": {
"nested": {
"path": "languages"
},
"aggs": {
"id": {
"terms": {
"field": "languages.language._id",
"size": 10
},
"aggs": {
"name": {
"terms": {
"field": "languages.language.name.keyword",
"size": 10
}
}
}
}
}
}
}
}
Result:
"aggregations" : {
"languages" : {
"doc_count" : 2,
"id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 1,
"name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "English",
"doc_count" : 1
}
]
}
},
{
"key" : 2,
"doc_count" : 1,
"name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Arabic",
"doc_count" : 1
}
]
}
}
]
}
}
}

Did you try this :
aggs: {
languages: {
terms: {
field: "languages.language._id.keyword",
size: 50
},
}
}
You do not need the other aggregation. You can access using the doc_count key

Related

Elastic Search multi match query can't ignore special characters

I have a name field value as "abc_name" so when I search "abc_" I am getting proper results but when I search "abc_##£&-#&" still I am getting same results. I want my query to ignore this special characters that doesn't matches with my query.
My query has:
Multi_match
type as cross_fields
operator AND
I am using search_analyzer standard for my Fields
And I want this structure as it is otherwise it will affect my other Search behaviour
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "autocomplete",
"search_analyzer": "standard"
}

Please see the below sample which would fit your use case where I've created a custom analyzer which would fit your use case:
Sample Mapping:
PUT some_test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "custom_tokenizer",
"filter": ["lowercase", "3_5_edge_ngram"]
}
},
"tokenizer": {
"custom_tokenizer": {
"type": "pattern",
"pattern": "\\w+_+[^a-zA-Z\\d\\s_]+|\\s+". <---- Note this pattern
}
},
"filter": {
"3_5_edge_ngram": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 5
}
}
}
},
"mappings": {
"properties": {
"my_field":{
"type": "text",
"analyzer": "my_custom_analyzer"
}
}
}
}
The above mentioned pattern would simply ignore the tokens with the format like abc_$%^^##. As a result this token would not be indexed.
Note that the way the analyzer works is:
First executes tokenizer
Then applies the edge_ngram filter on the tokens generated.
You can verify by simply removing the edge_ngram filter in the above mapping to first understand what tokens are getting generated via Analyze API which would be as below:
POST some_test_index/_analyze
{
"analyzer": "my_custom_analyzer",
"text": "abc_name asda efg_!##!## 1213_adav"
}
Tokens generated:
{
"tokens" : [
{
"token" : "abc_name",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 0
},
{
"token" : "asda",
"start_offset" : 9,
"end_offset" : 13,
"type" : "word",
"position" : 1
},
{
"token" : "1213_adav",
"start_offset" : 25,
"end_offset" : 34,
"type" : "word",
"position" : 2
}
]
}
Note that the token efg_!##!## has been removed.
I've added edge_ngram fitler as you would want the search to be successful if you search with abc_ if your tokens generated via tokenizer is abc_name.
Sample Document:
POST some_test_index/_doc/1
{
"my_field": "abc_name asda efg_!##!## 1213_adav"
}
Query Request:
Use-case 1:
POST some_test_index/_search
{
"query": {
"match": {
"my_field": "abc_"
}
}
}
Use-case-2:
POST some_test_index/_search
{
"query": {
"match": {
"my_field": "efg_!##!##"
}
}
}
Responses:
Response for use-case-1:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.47992462,
"hits" : [
{
"_index" : "some_test_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.47992462,
"_source" : {
"my_field" : "abc_name asda efg_!##!## 1213_adav"
}
}
]
}
}
Response for use-case-2:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
Updated Answer:
Create your mapping as follows based on the index I've created and let me know if that works:
PUT some_test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "punctuation",
"filter": ["lowercase"]
}
},
"tokenizer": {
"punctuation": {
"type": "pattern",
"pattern": "\\w+_+[^a-zA-Z\\d\\s_]+|\\s+"
}
}
}
},
"mappings": {
"properties": {
"my_field":{
"type": "text",
"analyzer": "autocompete", <----- Assuming you have already this in setting
"search_analyzer": "my_custom_analyzer". <----- Note this
}
}
}
}
Please try and let me know if this works for all your use-cases.

ElasticSearch NodeJS - Aggregation term return more than one source property

I need to get a unique list of things, with some of the properties that are attached. As of now this just returns a unique list of names, yet if I wanted to include the id of the aggregates doc's, what do I do?
I'm using the elasticsearch npm module with the .search() method
Any help would be greatly appreciated.
params.body.aggs = {
uniqueCoolThings: {
terms: {
field: 'cool_thing.name.keyword'
}
}
}
This will return a list of { key, doc_count } I want { key, id, doc_count }
That works! Thank you Technocrat Sid!
So what if my docs looks like this
{ cool_things: [{ name, id }, { name, id }] }
How would I find the id of the one I'm currently in the hit. For example this is the working query.
params.body.aggs = {
uniqueCoolThings: {
terms: {
field: 'cool_things.name.keyword'
},
aggs: {
value: {
top_hits: {
size: 1,
_source: {
includes: ['cool_things.id']
}
}
}
}
}
}
}
Yet this will return
...hits._source: {
uniqueCoolThings: [
{
"id": 500
},
{
"id": 501
}
]
} ...
I'm wondering how to do a where condition so that it will only return the ID that matches the unique cool_things.name.keyword it is currently on.

At most you can use top hits aggregation as a sub aggregation which keeps the track of the aggregated documents.
Example:
A similar terms aggregation query:
"aggs": {
"uniqueCoolThings": {
"terms": {
"field": "cool_thing.name.keyword"
}
}
}
will return the following results:
"aggregations": {
"uniqueCoolThings": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "XYZ",
"doc_count": 2
},
{
"key": "ABC",
"doc_count": 1
}
]
}
}
And if you add top hits aggregation as a sub aggregation to the above query:
"aggs": {
"uniqueCoolThings": {
"terms": {
"field": "cool_thing.name.keyword"
},
"aggs": {
"value": {
"top_hits": {
"_source": "false"
}
}
}
}
}
You'll get the following result:
"aggregations": {
"uniqueCoolThings": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "XYZ",
"doc_count": 2,
"value": {
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "product",
"_type": "_doc",
"_id": "BqGhPGgBOkyOnpPCsRPX",
"_score": 1,
"_source": {}
},
{
"_index": "product",
"_type": "_doc",
"_id": "BaGhPGgBOkyOnpPCfxOx",
"_score": 1,
"_source": {}
}
]
}
}
}
....
.... excluding output for brevity !!
Notice in the above result you have the aggregated documents _id(value.hits.hits._id) within your terms bucket.
Not sure of the syntax but something like this should work for you:
params.body.aggs = {
uniqueCoolThings: {
terms: {
field: 'cool_thing.name.keyword'
},
aggs: {
value: {
top_hits: {
_source: 'false'
}
}
}
}
}

How to calculate total for each token in Elasticsearch

I have a request into Elastic
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"something1 OR something2 OR something3",
"default_operator":"OR"
}
}
],
"filter":{
"range":{
"time":{
"gte":date
}
}
}
}
}
}
I wanna calculate count for each token in all documents using elastic search in one request, for example:
something1: 26 documents
something2: 12 documents
something3: 1 documents

Assuming that the tokens are not akin to enumerations (i.e. constrained set of specific values, like state names, which would make a terms aggregation your best bet with the right mapping), I think the closest thing to what you want would be to use filters aggregation:
POST your-index/_search
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"something1 OR something2 OR something3",
"default_operator":"OR"
}
}
],
"filter":{
"range":{
"time":{
"gte":date
}
}
}
}
},
"aggs": {
"token_doc_counts": {
"filters" : {
"filters" : {
"something1" : {
"bool": {
"must": { "query_string" : { "query" : "something1" } },
"filter": { "range": { "time": { "gte": date } } }
}
},
"something2" : {
"bool": {
"must": { "query_string" : { "query" : "something2" } },
"filter": { "range": { "time": { "gte": date } } }
}
},
"something3" : {
"bool": {
"must": { "query_string" : { "query" : "something3" } },
"filter": { "range": { "time": { "gte": date } } }
}
}
}
}
}
}
}
The response would look something like:
{
"took": 9,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"token_doc_counts": {
"buckets": {
"something1": {
"doc_count": 1
},
"something2": {
"doc_count": 2
},
"something3": {
"doc_count": 3
}
}
}
}
}

You can split your query into filters aggregation of three filters. For reference look here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html

What you would need to do, is to create a Copy_To field and have the mapping as shown below.
Depending on the fields that your query_string queries, you need to include some or all of the fields with copy_to field.
By default query_string searches all the fields, so you may need to specify copy_to for all the fields as shown in below mapping, where for sake of simplicity, I've created only three fields, title, field_2 and a third field content which would act as copied to field.
Mapping
PUT <your_index_name>
{
"mappings": {
"mydocs": {
"properties": {
"title": {
"type": "text",
"copy_to": "content"
},
"field_2": {
"type": "text",
"copy_to": "content"
},
"content": {
"type": "text",
"fielddata": true
}
}
}
}
}
Sample Documents
POST <your_index_name>/mydocs/1
{
"title": "something1",
"field_2": "something2"
}
POST <your_index_name>/mydocs/2
{
"title": "something2",
"field_2": "something3"
}
Query:
You'd get the required document counts for the each and every token using the below aggregation query and I've made use of Terms Aggregation:
POST <your_index_name>/_search
{
"size": 0,
"query": {
"query_string": {
"query": "something1 OR something2 OR something3"
}
},
"aggs": {
"myaggs": {
"terms": {
"field": "content",
"include" : ["something1","something2","something3"]
}
}
}
}
Query Response:
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"myaggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "something2",
"doc_count": 2
},
{
"key": "something1",
"doc_count": 1
},
{
"key": "something3",
"doc_count": 1
}
]
}
}
}
Let me know if it helps!

Filter parent by children aggregation in Elasticsearch

I have a parent-children relationship in my ES mapping and I want to filter the parents by the value of an aggregation (avg) on their children. That is, I only want to retrieve parents where that value is within a given range.
I tried to do it with aggs and post-filters but couldn't get it to work.
{
"apartments" : {
"mappings" : {
"apartment_availability" : {
"_parent" : {
"type" : "apartment"
},
"_routing" : {
"required" : true
},
"properties" : {
"availability_date" : {
"type" : "date"
},
"apartment_id" : {
"type" : "long"
},
"id" : {
"type" : "long"
},
"price_cents" : {
"type" : "long"
},
"status" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"apartment" : {
"properties" : {
"id" : {
"type" : "long"
},
}
}
}
}
}
If oru users select a period of March 24th-March 31st and a price range of €150-€300, then we want to show them all apartments that are free in that period and whose average price for that period is in the €150-€300 range.
Here's what we have so far:
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [{
"has_child": {
"type": "apartment_availability",
"min_children": 8,
"max_children": 8,
"query": {
"bool": {
"must": [{
"term": {
"status": "available"
}
}, {
"range": {
"availability_date": {
"gte": "2017-03-24",
"lte": "2017-03-31"
}
}
}]
}
}
}
}]
}
}
}
}
}

My suggestion, using bucket_selector aggregation to choose between apartments:
GET /apartments/apartment/_search
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"has_child": {
"type": "apartment_availability",
"query": {
"bool": {
"must": [
{
"term": {
"status": "available"
}
},
{
"range": {
"availability_date": {
"gte": "2017-04-01",
"lte": "2017-04-03"
}
}
}
]
}
}
}
}
]
}
}
}
},
"aggs": {
"apartments_ids": {
"terms": {
"field": "id",
"size": 10
},
"aggs": {
"avails": {
"children": {
"type": "apartment_availability"
},
"aggs": {
"filter_avails": {
"filter": {
"bool": {
"must": [
{
"term": {
"status": "available"
}
},
{
"range": {
"availability_date": {
"gte": "2017-04-01",
"lte": "2017-04-03"
}
}
}
]
}
},
"aggs": {
"average": {
"avg": {
"field": "price_cents"
}
}
}
}
}
},
"avg_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"avg": "avails>filter_avails.average"
},
"script": "params.avg > 150 && params.avg < 300"
}
}
}
}
}
}

Search query to retrieve nested documents in elasticsearch with _source disabled

I have the following mapping
{
"cloth": {
"dynamic" : false,
"_source" : {"enabled" : false },
"properties": {
"name": {
"type": "string",
"index": "analyzed"
},
"variation": {
"type": "nested",
"properties": {
"size": {
"type": "string",
"index": "not_analyzed"
},
"color": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
I am not able to figure out a way to retrieve the nested object fields using the fields query.
{
"fields" : ["name" , "variation.size", "variation.color"],
"query" : {
"nested" : {
"path" : "variation",
"query" : {
"bool" : {
"must" : [
{ "term" : { "variation.size" : "XXL" } },
{ "term" : { "variation.color" : "red" } }
]
}
}
}
}
}
The above query returns
"_id" : "1",
"_score" : 1.987628,
"fields" : {
"variation.size" : [ "XXL", "XL" ],
"variation.color" : [ "red", "black" ],
"name" : [ "Test shirt" ]
}
When I tried
"fields" : ["name" , "variation"]
I got the error
status: 400
reason: "ElasticsearchIllegalArgumentException[field [variation] isn't a leaf field]"
Which is as expected.
How can I get the variation object as it is?
Expected Result. I need to retrieve the variable object as a whole so that I can preserve the association of size and color. Like "red" with "XXL".
"variation" : { "XXL" , "red"}
Update: Source is disabled for this Index Type.

If you use Source Filtering it will return the nested objects as a whole, your query would be:
{
"_source": [
"name",
"variation"
],
"query": {
"nested": {
"path": "variation",
"query": {
"bool": {
"must": [
{
"term": {
"variation.size": "XXL"
}
},
{
"term": {
"variation.color": "red"
}
}
]
}
}
}
}
}

You should use this:
"script_fields": {
"variation": {
"script": {
"inline": "doc['variation.size'].value + ' ' + doc['variation.red'].value"
}
}
}
I use elasticsearch v. 5.1.1

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Aggregation on an array of objects - node.js

Did you try this : aggs: { languages: { terms: { field: "languages.language._id.keyword", size: 50 }, } } You do not need the other aggregation. You can access using the doc_count key

Related

Elastic Search multi match query can't ignore special characters

ElasticSearch NodeJS - Aggregation term return more than one source property

How to calculate total for each token in Elasticsearch

Filter parent by children aggregation in Elasticsearch

Search query to retrieve nested documents in elasticsearch with _source disabled

Categories

Resources