Aggregation on an array of objects - node.js

I have the following data in my elastic.
{
...someData,
languages: [
{
language:{_id: 1, name:"English"}
},
{
language:{_id: 2, name:"Arabic"}
}
]
}
But when I aggregate the data using this query
aggs: {
languages: {
terms: {
field: "languages.language._id.keyword",
size: 50
},
aggs: {
value: {
terms: {
field: "languages.language.name.keyword"
}
}
}
}
}
I will get the English id with 2 buckets for Arabic and English
and same for Arabic id, because technically its included there.
Is there a way to return only the count of the object I need?
Thanks

You need to define languages field as nested for applying aggregation on individual element of array.
Configured Nested field:
PUT index0
{
"mappings": {
"properties": {
"languages":{
"type": "nested"
}
}
}
}
Sample document index:
POST index0/_doc
{
"languages": [
{
"language": {
"_id": 1,
"name": "English"
}
},
{
"language": {
"_id": 2,
"name": "Arabic"
}
}
]
}
Sample Aggregation Query:
{
"size": 0,
"aggs": {
"languages": {
"nested": {
"path": "languages"
},
"aggs": {
"id": {
"terms": {
"field": "languages.language._id",
"size": 10
},
"aggs": {
"name": {
"terms": {
"field": "languages.language.name.keyword",
"size": 10
}
}
}
}
}
}
}
}
Result:
"aggregations" : {
"languages" : {
"doc_count" : 2,
"id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 1,
"name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "English",
"doc_count" : 1
}
]
}
},
{
"key" : 2,
"doc_count" : 1,
"name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Arabic",
"doc_count" : 1
}
]
}
}
]
}
}
}

Did you try this :
aggs: {
languages: {
terms: {
field: "languages.language._id.keyword",
size: 50
},
}
}
You do not need the other aggregation. You can access using the doc_count key

Related

Elastic Search multi match query can't ignore special characters

I have a name field value as "abc_name" so when I search "abc_" I am getting proper results but when I search "abc_##£&-#&" still I am getting same results. I want my query to ignore this special characters that doesn't matches with my query.
My query has:
Multi_match
type as cross_fields
operator AND
I am using search_analyzer standard for my Fields
And I want this structure as it is otherwise it will affect my other Search behaviour
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
Please see the below sample which would fit your use case where I've created a custom analyzer which would fit your use case:
Sample Mapping:
PUT some_test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "custom_tokenizer",
"filter": ["lowercase", "3_5_edge_ngram"]
}
},
"tokenizer": {
"custom_tokenizer": {
"type": "pattern",
"pattern": "\\w+_+[^a-zA-Z\\d\\s_]+|\\s+". <---- Note this pattern
}
},
"filter": {
"3_5_edge_ngram": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 5
}
}
}
},
"mappings": {
"properties": {
"my_field":{
"type": "text",
"analyzer": "my_custom_analyzer"
}
}
}
}
The above mentioned pattern would simply ignore the tokens with the format like abc_$%^^##. As a result this token would not be indexed.
Note that the way the analyzer works is:
First executes tokenizer
Then applies the edge_ngram filter on the tokens generated.
You can verify by simply removing the edge_ngram filter in the above mapping to first understand what tokens are getting generated via Analyze API which would be as below:
POST some_test_index/_analyze
{
"analyzer": "my_custom_analyzer",
"text": "abc_name asda efg_!##!## 1213_adav"
}
Tokens generated:
{
"tokens" : [
{
"token" : "abc_name",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 0
},
{
"token" : "asda",
"start_offset" : 9,
"end_offset" : 13,
"type" : "word",
"position" : 1
},
{
"token" : "1213_adav",
"start_offset" : 25,
"end_offset" : 34,
"type" : "word",
"position" : 2
}
]
}
Note that the token efg_!##!## has been removed.
I've added edge_ngram fitler as you would want the search to be successful if you search with abc_ if your tokens generated via tokenizer is abc_name.
Sample Document:
POST some_test_index/_doc/1
{
"my_field": "abc_name asda efg_!##!## 1213_adav"
}
Query Request:
Use-case 1:
POST some_test_index/_search
{
"query": {
"match": {
"my_field": "abc_"
}
}
}
Use-case-2:
POST some_test_index/_search
{
"query": {
"match": {
"my_field": "efg_!##!##"
}
}
}
Responses:
Response for use-case-1:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.47992462,
"hits" : [
{
"_index" : "some_test_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.47992462,
"_source" : {
"my_field" : "abc_name asda efg_!##!## 1213_adav"
}
}
]
}
}
Response for use-case-2:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
Updated Answer:
Create your mapping as follows based on the index I've created and let me know if that works:
PUT some_test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "punctuation",
"filter": ["lowercase"]
}
},
"tokenizer": {
"punctuation": {
"type": "pattern",
"pattern": "\\w+_+[^a-zA-Z\\d\\s_]+|\\s+"
}
}
}
},
"mappings": {
"properties": {
"my_field":{
"type": "text",
"analyzer": "autocompete", <----- Assuming you have already this in setting
"search_analyzer": "my_custom_analyzer". <----- Note this
}
}
}
}
Please try and let me know if this works for all your use-cases.

ElasticSearch NodeJS - Aggregation term return more than one source property

I need to get a unique list of things, with some of the properties that are attached. As of now this just returns a unique list of names, yet if I wanted to include the id of the aggregates doc's, what do I do?
I'm using the elasticsearch npm module with the .search() method
Any help would be greatly appreciated.
params.body.aggs = {
uniqueCoolThings: {
terms: {
field: 'cool_thing.name.keyword'
}
}
}
This will return a list of { key, doc_count } I want { key, id, doc_count }
That works! Thank you Technocrat Sid!
So what if my docs looks like this
{ cool_things: [{ name, id }, { name, id }] }
How would I find the id of the one I'm currently in the hit. For example this is the working query.
params.body.aggs = {
uniqueCoolThings: {
terms: {
field: 'cool_things.name.keyword'
},
aggs: {
value: {
top_hits: {
size: 1,
_source: {
includes: ['cool_things.id']
}
}
}
}
}
}
}
Yet this will return
...hits._source: {
uniqueCoolThings: [
{
"id": 500
},
{
"id": 501
}
]
} ...
I'm wondering how to do a where condition so that it will only return the ID that matches the unique cool_things.name.keyword it is currently on.
At most you can use top hits aggregation as a sub aggregation which keeps the track of the aggregated documents.
Example:
A similar terms aggregation query:
"aggs": {
"uniqueCoolThings": {
"terms": {
"field": "cool_thing.name.keyword"
}
}
}
will return the following results:
"aggregations": {
"uniqueCoolThings": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "XYZ",
"doc_count": 2
},
{
"key": "ABC",
"doc_count": 1
}
]
}
}
And if you add top hits aggregation as a sub aggregation to the above query:
"aggs": {
"uniqueCoolThings": {
"terms": {
"field": "cool_thing.name.keyword"
},
"aggs": {
"value": {
"top_hits": {
"_source": "false"
}
}
}
}
}
You'll get the following result:
"aggregations": {
"uniqueCoolThings": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "XYZ",
"doc_count": 2,
"value": {
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "product",
"_type": "_doc",
"_id": "BqGhPGgBOkyOnpPCsRPX",
"_score": 1,
"_source": {}
},
{
"_index": "product",
"_type": "_doc",
"_id": "BaGhPGgBOkyOnpPCfxOx",
"_score": 1,
"_source": {}
}
]
}
}
}
....
.... excluding output for brevity !!
Notice in the above result you have the aggregated documents _id(value.hits.hits._id) within your terms bucket.
Not sure of the syntax but something like this should work for you:
params.body.aggs = {
uniqueCoolThings: {
terms: {
field: 'cool_thing.name.keyword'
},
aggs: {
value: {
top_hits: {
_source: 'false'
}
}
}
}
}

How to calculate total for each token in Elasticsearch

I have a request into Elastic
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"something1 OR something2 OR something3",
"default_operator":"OR"
}
}
],
"filter":{
"range":{
"time":{
"gte":date
}
}
}
}
}
}
I wanna calculate count for each token in all documents using elastic search in one request, for example:
something1: 26 documents
something2: 12 documents
something3: 1 documents
Assuming that the tokens are not akin to enumerations (i.e. constrained set of specific values, like state names, which would make a terms aggregation your best bet with the right mapping), I think the closest thing to what you want would be to use filters aggregation:
POST your-index/_search
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"something1 OR something2 OR something3",
"default_operator":"OR"
}
}
],
"filter":{
"range":{
"time":{
"gte":date
}
}
}
}
},
"aggs": {
"token_doc_counts": {
"filters" : {
"filters" : {
"something1" : {
"bool": {
"must": { "query_string" : { "query" : "something1" } },
"filter": { "range": { "time": { "gte": date } } }
}
},
"something2" : {
"bool": {
"must": { "query_string" : { "query" : "something2" } },
"filter": { "range": { "time": { "gte": date } } }
}
},
"something3" : {
"bool": {
"must": { "query_string" : { "query" : "something3" } },
"filter": { "range": { "time": { "gte": date } } }
}
}
}
}
}
}
}
The response would look something like:
{
"took": 9,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"token_doc_counts": {
"buckets": {
"something1": {
"doc_count": 1
},
"something2": {
"doc_count": 2
},
"something3": {
"doc_count": 3
}
}
}
}
}
You can split your query into filters aggregation of three filters. For reference look here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html
What you would need to do, is to create a Copy_To field and have the mapping as shown below.
Depending on the fields that your query_string queries, you need to include some or all of the fields with copy_to field.
By default query_string searches all the fields, so you may need to specify copy_to for all the fields as shown in below mapping, where for sake of simplicity, I've created only three fields, title, field_2 and a third field content which would act as copied to field.
Mapping
PUT <your_index_name>
{
"mappings": {
"mydocs": {
"properties": {
"title": {
"type": "text",
"copy_to": "content"
},
"field_2": {
"type": "text",
"copy_to": "content"
},
"content": {
"type": "text",
"fielddata": true
}
}
}
}
}
Sample Documents
POST <your_index_name>/mydocs/1
{
"title": "something1",
"field_2": "something2"
}
POST <your_index_name>/mydocs/2
{
"title": "something2",
"field_2": "something3"
}
Query:
You'd get the required document counts for the each and every token using the below aggregation query and I've made use of Terms Aggregation:
POST <your_index_name>/_search
{
"size": 0,
"query": {
"query_string": {
"query": "something1 OR something2 OR something3"
}
},
"aggs": {
"myaggs": {
"terms": {
"field": "content",
"include" : ["something1","something2","something3"]
}
}
}
}
Query Response:
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"myaggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "something2",
"doc_count": 2
},
{
"key": "something1",
"doc_count": 1
},
{
"key": "something3",
"doc_count": 1
}
]
}
}
}
Let me know if it helps!

Filter parent by children aggregation in Elasticsearch

I have a parent-children relationship in my ES mapping and I want to filter the parents by the value of an aggregation (avg) on their children. That is, I only want to retrieve parents where that value is within a given range.
I tried to do it with aggs and post-filters but couldn't get it to work.
{
"apartments" : {
"mappings" : {
"apartment_availability" : {
"_parent" : {
"type" : "apartment"
},
"_routing" : {
"required" : true
},
"properties" : {
"availability_date" : {
"type" : "date"
},
"apartment_id" : {
"type" : "long"
},
"id" : {
"type" : "long"
},
"price_cents" : {
"type" : "long"
},
"status" : {
"type" : "text",
"fields" : {
"keyword" : {
"type" : "keyword",
"ignore_above" : 256
}
}
}
}
},
"apartment" : {
"properties" : {
"id" : {
"type" : "long"
},
}
}
}
}
}
If oru users select a period of March 24th-March 31st and a price range of €150-€300, then we want to show them all apartments that are free in that period and whose average price for that period is in the €150-€300 range.
Here's what we have so far:
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [{
"has_child": {
"type": "apartment_availability",
"min_children": 8,
"max_children": 8,
"query": {
"bool": {
"must": [{
"term": {
"status": "available"
}
}, {
"range": {
"availability_date": {
"gte": "2017-03-24",
"lte": "2017-03-31"
}
}
}]
}
}
}
}]
}
}
}
}
}
My suggestion, using bucket_selector aggregation to choose between apartments:
GET /apartments/apartment/_search
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
{
"has_child": {
"type": "apartment_availability",
"query": {
"bool": {
"must": [
{
"term": {
"status": "available"
}
},
{
"range": {
"availability_date": {
"gte": "2017-04-01",
"lte": "2017-04-03"
}
}
}
]
}
}
}
}
]
}
}
}
},
"aggs": {
"apartments_ids": {
"terms": {
"field": "id",
"size": 10
},
"aggs": {
"avails": {
"children": {
"type": "apartment_availability"
},
"aggs": {
"filter_avails": {
"filter": {
"bool": {
"must": [
{
"term": {
"status": "available"
}
},
{
"range": {
"availability_date": {
"gte": "2017-04-01",
"lte": "2017-04-03"
}
}
}
]
}
},
"aggs": {
"average": {
"avg": {
"field": "price_cents"
}
}
}
}
}
},
"avg_bucket_filter": {
"bucket_selector": {
"buckets_path": {
"avg": "avails>filter_avails.average"
},
"script": "params.avg > 150 && params.avg < 300"
}
}
}
}
}
}

Search query to retrieve nested documents in elasticsearch with _source disabled

I have the following mapping
{
"cloth": {
"dynamic" : false,
"_source" : {"enabled" : false },
"properties": {
"name": {
"type": "string",
"index": "analyzed"
},
"variation": {
"type": "nested",
"properties": {
"size": {
"type": "string",
"index": "not_analyzed"
},
"color": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
I am not able to figure out a way to retrieve the nested object fields using the fields query.
{
"fields" : ["name" , "variation.size", "variation.color"],
"query" : {
"nested" : {
"path" : "variation",
"query" : {
"bool" : {
"must" : [
{ "term" : { "variation.size" : "XXL" } },
{ "term" : { "variation.color" : "red" } }
]
}
}
}
}
}
The above query returns
"_id" : "1",
"_score" : 1.987628,
"fields" : {
"variation.size" : [ "XXL", "XL" ],
"variation.color" : [ "red", "black" ],
"name" : [ "Test shirt" ]
}
When I tried
"fields" : ["name" , "variation"]
I got the error
status: 400
reason: "ElasticsearchIllegalArgumentException[field [variation] isn't a leaf field]"
Which is as expected.
How can I get the variation object as it is?
Expected Result. I need to retrieve the variable object as a whole so that I can preserve the association of size and color. Like "red" with "XXL".
"variation" : { "XXL" , "red"}
Update: Source is disabled for this Index Type.
If you use Source Filtering it will return the nested objects as a whole, your query would be:
{
"_source": [
"name",
"variation"
],
"query": {
"nested": {
"path": "variation",
"query": {
"bool": {
"must": [
{
"term": {
"variation.size": "XXL"
}
},
{
"term": {
"variation.color": "red"
}
}
]
}
}
}
}
}
You should use this:
"script_fields": {
"variation": {
"script": {
"inline": "doc['variation.size'].value + ' ' + doc['variation.red'].value"
}
}
}
I use elasticsearch v. 5.1.1

Resources