Nested Filter Query with Exclusionary ORs - search

I am unable to restrict the result set to documents that match both kol_tags.scored.name and kol_tags.scored.score range for both the or options below.
I would like to match documents that have the kol_tags.scored.name of "Core Grower" and kol_tags.scored.score between 1 and 100 unless they also have kol_tags.scored.name of "Connectivity" where kol_tags.scored.score is NOT in the range of 35 to 65.
Given the following mapping (non nested fields omitted for brevity):
GET /production_users/user/_mapping
{
"user": {
"_all": {
"enabled": false
},
"properties": {
"kol_tags": {
"type": "nested",
"properties": {
"scored": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"index": "not_analyzed",
"omit_norms": true,
"index_options": "docs"
},
"score": {
"type": "integer"
}
}
}
}
}
}
}
}
I am executing the following query:
{
"filter": {
"nested": {
"path": "kol_tags.scored",
"filter": {
"or": [
{
"and": [
{
"terms": {
"kol_tags.scored.name": [
"Core Grower"
]
}
},
{
"range": {
"kol_tags.scored.score": {
"gte": 1,
"lte": 100
}
}
}
]
},
{
"and": [
{
"terms": {
"kol_tags.scored.name": [
"Connectivity"
]
}
},
{
"range": {
"kol_tags.scored.score": {
"gte": 35,
"lte": 65
}
}
}
]
}
]
}
}
}
}
With the query above I get documents that match kol_tags.scored.name of "Core Grower" and kol_tags.scored.score between 1 and 100 and ALSO that have kol_tags.scored.name of "Connectivity" and kol_tags.scored.score in any range.
What I need is documents that match:
kol_tags.scored.name of "Core Grower" and kol_tags.scored.score between 1 and 100
kol_tags.scored.name of "Connectivity" and kol_tags.scored.score between 35 and 65
Exclude any documents that have kol_tags.scored.name of "Connectivity" and kol_tags.scored.score less than 34 and greater than 66

There's some ambiguity in your description, but I've tried to make a runnable example that should work here: https://www.found.no/play/gist/8940202 (also embedded below)
Here's a few things I did:
Put the filter in a filtered-query. A top level filter (renamed to post_filter in Elasticsearch 1.0) should only be used if you want to filter hits, but not facets.
Use bool instead of and and or, since the filters are cachable. More here: http://www.elasticsearch.org/blog/all-about-elasticsearch-filter-bitsets/
And most importantly, put the nested inside the bool, so the logic gets right wrt. what should match on the nested vs. the parent document.
Added a must_not to account for your last point. Not sure if you can have two sub-documents with name "Connectivity", but if you can, that should account for it. If you'll only ever have one, you can remove the must_not.
You didn't provide any sample documents, so I made some I think should fit your description. I don't think you need two levels of nested.
#!/bin/bash
export ELASTICSEARCH_ENDPOINT="http://localhost:9200"
# Create indexes
curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
"mappings": {
"type": {
"properties": {
"kol_tags": {
"properties": {
"scored": {
"type": "nested",
"properties": {
"name": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
}
}'
# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"kol_tags":{"scored":[{"name":"Core Grower","score":36},{"name":"Connectivity","score":42}]}}
{"index":{"_index":"play","_type":"type"}}
{"kol_tags":{"scored":[{"name":"Connectivity","score":34},{"name":"Connectivity","score":42}]}}
{"index":{"_index":"play","_type":"type"}}
{"kol_tags":{"scored":[{"name":"Core Grower","score":36}]}}
{"index":{"_index":"play","_type":"type"}}
{"kol_tags":{"scored":[{"name":"Connectivity","score":36}]}}
'
# Do searches
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"nested": {
"path": "kol_tags.scored",
"filter": {
"bool": {
"must": [
{
"term": {
"name": "Core Grower"
}
},
{
"range": {
"score": {
"gte": 1,
"lte": 100
}
}
}
]
}
}
}
},
{
"nested": {
"path": "kol_tags.scored",
"filter": {
"bool": {
"must": [
{
"term": {
"name": "Connectivity"
}
},
{
"range": {
"score": {
"gte": 35,
"lte": 65
}
}
}
]
}
}
}
}
],
"must_not": [
{
"nested": {
"path": "kol_tags.scored",
"filter": {
"bool": {
"must": [
{
"term": {
"name": "Connectivity"
}
},
{
"not": {
"range": {
"score": {
"gte": 35,
"lte": 65
}
}
}
}
]
}
}
}
}
]
}
}
}
}
}
'
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"filter": {
"nested": {
"path": "kol_tags.scored",
"filter": {
"or": [
{
"and": [
{
"terms": {
"kol_tags.scored.name": [
"Core Grower"
]
}
},
{
"range": {
"kol_tags.scored.score": {
"gte": 1,
"lte": 100
}
}
}
]
},
{
"and": [
{
"terms": {
"kol_tags.scored.name": [
"Connectivity"
]
}
},
{
"range": {
"kol_tags.scored.score": {
"gte": 35,
"lte": 65
}
}
}
]
}
]
}
}
}
}
'

Related

How to do elasticsearch aggregation together with sort and find duplicate values

I want to find duplicate values and if there are duplicate values then I sort based on the last update, so what I take is the newest one, how do I do aggregations? I've tried this aggregation.
I've tried adding sort to sources but it still doesn't work, I've tried several ways but it still fails sometimes it comes out 1 but only old data, sometimes the order is correct from the newest but appears 2 data
{
"size": 0,
"query": {
"bool": {
"must": [
{
"match": {
"BILLING_TYPE_CD": "Service Bundle"
}
},
{
"match": {
"ID": "xxxx"
}
},
{
"exists": {
"field": "LI_MILESTONE"
}
},
{
"exists": {
"field": "LI_SID"
}
},
{
"query_string": {
"default_field": "LI_SID",
"query": "*xxxx*"
}
}
],
"must_not": {
"bool": {
"must": [
{
"query_string": {
"default_field": "LI_PRODUCT_NAME",
"query": "*Network*"
}
},
{
"terms": {
"LI_MILESTONE.keyword": [
"Abandoned",
"Cancelled"
]
}
},
{
"terms": {
"ORDER_STATUS.keyword": [
"Abandoned",
"Cancelled",
"Drop In Progress"
]
}
},
{
"term": {
"STATUS.keyword": ""
}
}
]
}
}
}
},
"sort": [
{
"TGL_CREATED": {
"order": "desc"
}
}
],
"aggs": {
"list_products": {
"composite": {
"size": 50000,
"sources": [
{
"LI_SID": {
"terms": {
"field": "LI_SID.keyword",
"order": "desc"
}
}
}
]
},
"aggs": {
"totalService": {
"terms": {
"field": "LI_SID.keyword",
"size": 50000,
"order": {
"_term": "asc"
}
}
},
"bucket_sort": {
"bucket_sort": {
"from": 0,
"size": 10
}
},
"includes_source": {
"top_hits": {
"size": 1,
"_source": {
"includes": [
"LAST_UPDATE",
"xxxxx",
"xxxxx",
"xxxxx",
"xxx"
]
}
}
}
}
},
"term_product": {
"terms": {
"field": "LI_SID.keyword",
"size": 50000
}
}
}
}
Like this ?
{
"aggs": {
"LI_SID": {
"terms": {
"field": "LI_SID.keyword",
"size": 10
},
"aggs": {
"hit": {
"top_hits": {
"size": 1,
"sort": [
{
"LAST_UPDATE": "desc"
}
]
}
}
}
}
},
"size": 0
}
You need to use aggregations response not hits

count of records that have matched conditions inside a bool query

I have an ES query along the lines of (condition1 or condition2 or condition3....) and otherConditions.
Each condition inside the brackets is a 'must' clause that searches for all documents that match a given name, location and product.
GET index/type/_count
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"NAME": {
"value": "name1"
}
}
},
{
"term": {
"PRODUCT": {
"value": "product1"
}
}
},
{
"term": {
"LOCATION": {
"value": "location1"
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"NAME": {
"value": "name2"
}
}
},
{
"term": {
"PRODUCT": {
"value": "product2"
}
}
},
{
"term": {
"LOCATION": {
"value": "location2"
}
}
}
]
}
}
],
"must_not": [
{
"exists": {
"field": "some other condition"
}
}
],
"must": [
{
"term": {
"somefield": "value"
}
},
{
"range": {
"time": {
"gte": "now-6M"
}
}
}
]
}
}
}
Is it possible to get count of records that matches each of the 'must' clause inside the 'should' clause instead of an overall count using one query?
Yes, you can do it using aggregations, in particular filter aggregation. The query might look like this:
POST index/type/_search
{
"query": {
"bool": {
"should": [
"<clause1>",
"<clause2>"
],
"must_not": [
"<mustNotClause3>"
],
"must": [
"<mustClause4>"
]
}
},
"aggs": {
"clause1": {
"filter": "<clause1>"
},
"clause2": {
"filter": "<clause2>"
}
}
}
Note that we are using _search API here. If you don't need search results, you can set size: 0, this will return you only total count and the aggregations.
In your case the query will literally be this:
POST index/type/_search
{
"query": {
"bool": {
"should": [
{
"bool": {
"must": [
{
"term": {
"NAME": {
"value": "name1"
}
}
},
{
"term": {
"PRODUCT": {
"value": "product1"
}
}
},
{
"term": {
"LOCATION": {
"value": "location1"
}
}
}
]
}
},
{
"bool": {
"must": [
{
"term": {
"NAME": {
"value": "name2"
}
}
},
{
"term": {
"PRODUCT": {
"value": "product2"
}
}
},
{
"term": {
"LOCATION": {
"value": "location2"
}
}
}
]
}
}
],
"must_not": [
{
"exists": {
"field": "some other condition"
}
}
],
"must": [
{
"term": {
"somefield": "value"
}
},
{
"range": {
"time": {
"gte": "now-6M"
}
}
}
]
}
},
"aggs": {
"clause1": {
"filter": {
"bool": {
"must": [
{
"term": {
"NAME": {
"value": "name1"
}
}
},
{
"term": {
"PRODUCT": {
"value": "product1"
}
}
},
{
"term": {
"LOCATION": {
"value": "location1"
}
}
}
]
}
}
},
"clause2": {
"filter": {
"bool": {
"must": [
{
"term": {
"NAME": {
"value": "name2"
}
}
},
{
"term": {
"PRODUCT": {
"value": "product2"
}
}
},
{
"term": {
"LOCATION": {
"value": "location2"
}
}
}
]
}
}
}
}
}
Note that sum of counts of aggregations clause1 and clause2 may be greater than total count.
Hope that helps!

Elasticsearch sorting not working properly based on time

I have 20 documents and i'm performing aggregation based on reportid. I need top 10 aggregation based on time in descending. But the response is very random. What am i missing? I'm using elasticsearch 6.2.2 and node.js 4.5. Below here is the body search query for elasticsearch request.
{
"size": 0,
"sort": [
{
"triggerDate":
{
"order": "desc"
}
}],
"query":
{
"bool":
{
"must": [
{
"query_string":
{
"query": "*",
"analyze_wildcard": true
}
},
{
"range":
{
"triggerDate":
{
"gte": fromTime,
"lte": toTime
}
}
}
],
"must_not": [
{
"query_string":
{
"query": "reportId.keyword:\"\"",
"analyze_wildcard": true
}
}]
}
},
"_source":
{
"excludes": []
},
"aggs":
{
"reportid":
{
"terms":
{
"field": "reportId.keyword",
"size": 10
}
}
}
I think what you need to do is aggregate on reportId.keyword and sort aggregation by date.
So here is the solution
{
"size": 0,
"query": {
"bool": {
"must": [
{
"query_string": {
"query": "*",
"analyze_wildcard": true
}
},
{
"range": {
"triggerDate": {
"gte": fromTime,
"lte": toTime
}
}
}
],
"must_not": [
{
"query_string": {
"query": "reportId.keyword:\"\"",
"analyze_wildcard": true
}
}
]
}
},
"_source": {
"excludes": []
},
"aggs": {
"reportid": {
"terms": {
"field": "reportId.keyword",
"size": 10,
"order": {
"2-orderAgg": "desc"
}
},
"aggs": {
"2-orderAgg": {
"max": {
"field": "triggerDate"
}
}
}
}
}
}
You need to sort the aggregation results by a custom aggregation and not the query results.

Elastic Search: Matching fields in different nested objects

I am new to Elastic Search and this is my user index:
{
"user": {
"properties": {
"branches": {
"type": "nested"
},
"lists": {
"type": "nested"
},
"events": {
"type": "nested"
},
"optOuts": {
"type": "nested"
}
}
}
}
Here, branches, events and lists will contain the field id(int),countryIso(String)..
I need to find users having emails who belong to countryIso 'XX' for example.
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "email"
}
},
{
"match": {
"prog_id": 3
}
},
{
"nested": {
"path": [
"branches"
],
"query": {
"query_string": {
"fields": [
"branches.countryIso"
],
"query": "AE KW"
}
}
}
}
]
}
}
}
This way I can get them if they have that country in the branches object. What I want is that the countryIso is there in the branches or lists or events.
Note: any of these might be empty i.e. branches may not be there or lists miht not be there etc. Or lists might be there with no countryIso..
I tried this:
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "email"
}
},
{
"match": {
"prog_id": 3
}
},
{
"nested": {
"path": [
"branches"
],
"query": {
"query_string": {
"fields": [
"branches.countryIso"
],
"query": "AE KW"
}
}
}
},
{
"nested": {
"path": [
"lists"
],
"query": {
"query_string": {
"fields": [
"lists.countryIso"
],
"query": "AE KW"
}
}
}
}
]
}
}
}
AND
{
"query": {
"bool": {
"must": [
{
"exists": {
"field": "email"
}
},
{
"match": {
"prog_id": 3
}
},
{
"nested": {
"path": [
"branches",
"lists"
],
"query": {
"query_string": {
"fields": [
"branches.countryIso",
"lists.countryIso"
],
"query": "AE KW"
}
}
}
}
]
}
}
}
But neither works.

Limit filter by terms elastic search

I would put a size limit per terms, 3 retrieve results for the term "tag", 5 results for the term "dossier" and 1 result for the term "personality".
Can i use limit filter or and other solution ?
{
"_source":{
"include":[
"path",
"type"
]
},
"query":{
"bool":{
"should":[
{
"match":{
"title.acp":{
"query":"car",
"boost":10
}
}
},
{
"match":{
"title.acp":{
"query":"car",
"fuzziness":"AUTO",
"prefix_length":3
}
}
}
],
"filter":[
{
"terms":{
"type":[
"tag",
"dossier",
"personality"
]
}
}
]
}
},
"highlight":{
"fields":{
"title.acp":{}
}
}
};
Looks like for a given 'title' you want top x documents for each of the types where 'x' varies with type
One way to do this is use aggregation filter and top-hits in conjunction :
Example :
{
"size": 0,
"query": {
"bool": {
"should": [
{
"match": {
"title.acp": {
"query": "car",
"boost": 10
}
}
},
{
"match": {
"title.acp": {
"query": "car",
"fuzziness": "AUTO",
"prefix_length": 3
}
}
}
],
"filter": [
{
"terms": {
"type": [
"tag",
"dossier",
"personality"
]
}
}
]
}
},
"aggs": {
"tag": {
"filter": {
"term": {
"type": "tag"
}
},
"aggs": {
"tag_top_hits": {
"top_hits": {
"_source": {
"include": [
"path",
"type"
]
},
"size": 3,
"highlight": {
"fields": {
"title.acp": {}
}
}
}
}
}
},
"dossier": {
"filter": {
"term": {
"type": "dossier"
}
},
"aggs": {
"dossier_top_hits": {
"top_hits": {
"_source": {
"include": [
"path",
"type"
]
},
"size": 5,
"highlight": {
"fields": {
"title.acp": {}
}
}
}
}
}
},
"personality": {
"filter": {
"term": {
"type": "personality"
}
},
"aggs": {
"personality_top_hits": {
"top_hits": {
"_source": {
"include": [
"path",
"type"
]
},
"size": 1,
"highlight": {
"fields": {
"title.acp": {}
}
}
}
}
}
}
}
}

Resources