My Search query works fine (I hope) but sometimes I have too many results with score like 1.5, 0.7, 0.6... or 0.1, 0.001, 0.001...
Is it possible to block low relevance results?
Fixed value is unsuitable - because it depends of maximum _score (score of most relevant result).
It shuld work like "block all results which has _score twice less then maximum _score (score of most relevant result)"
{
"query": {
"bool": {
"disable_coord": true,
"must": [
{ "match": {
"ObjectTypeSysName": {
"query": "participant"
}
}
},
{ "match": {
"_all": {
"query": "text-to-find",
"operator": "and",
"fuzziness": "AUTO",
"minimum_should_match": 1
}
}}
],
"should": [
{ "multi_match" : {
"query": "text-to-find",
"type": "best_fields",
"fields": [
"*NAME",
"ObjectData.EXTERNALID",
"ObjectData.contactList.VALUE",
"*SERIES",
"*NUMBER"
],
"operator": "or",
"boost": 2
}}
]
}
}
}
Related
{
"query": {
"bool": {
"must": {
"bool": {
"must": [
{
"multi_match": {
"query": [
"869336","45345345"
],
"type": "phrase_prefix",
"fields": [
"id",
"accountNo",
"moblileNo"
]
}
}
],
"should": [],
"must_not": []
}
},
"filter": {
"bool": {
"must": {
"bool": {
"must": [],
"should": [],
"must_not": []
}
}
}
}
}
}
}
Need to use multi match query only.I have mentioned some sample fields in the query.I get below error when I run it on postman :
[multi_match] unknown token [START_ARRAY] after [query]
For the error that is because query only takes a single input string.
It can have multiple values like "query" : "869336 45345345" however with the values separated by spaces. How this works, you can probably go through this link.
Now looking into your scenario, assuming you'd want to apply phrase match queries for both the values i.e. 869336 and 45345345, you just need to separate out the values in its individual multi-match queries.
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": {
"bool": {
"must": [
{
"multi_match": {
"query": "869336",
"type": "phrase_prefix",
"fields": [
"accountNo",
"moblileNo"
]
}
},
{
"multi_match": {
"query": "45345345",
"type": "phrase_prefix",
"fields": [
"accountNo",
"moblileNo"
]
}
}
],
"should": [],
"must_not": []
}
},
"filter": {
"bool": {
"must": {
"bool": {
"must": [],
"should": [],
"must_not": []
}
}
}
}
}
}
}
Now if you do not want to apply phrase_prefix and instead want to return all the documents having both values in any of the fields, you can simply write the query as below:
POST my-multimatch-index/_search
{
"query": {
"bool": {
"must": {
"bool": {
"must": [
{
"multi_match": {
"query": "869336 45345345",
"fields": [
"accountNo",
"moblileNo"
],
"type": "cross_fields", <--- Note this
"operator": "and" <--- This would mean only return those documents having both these value.
}
}
],
"should": [],
"must_not": []
}
},
"filter": {
"bool": {
"must": {
"bool": {
"must": [],
"should": [],
"must_not": []
}
}
}
}
}
}
}
Note the above comments and how I made use of cross-fields. Best to go through the links to get better understanding.
Anyways feel free to ask for questions and I hope this helps!
{
"query": {
"bool": {
"must": {
"bool": {
"must": [
{
"multi_match": {
"query": "FGTHSF-2124-6",
"type": "phrase_prefix",
"fields": [
"contact.name"
]
}
},
{
"terms": {
"contact.id": [
"sdfwerwe",
"6789",
"4567",
"12345"
]
}
}
],
"should": [],
"must_not": []
}
},
"filter": {
"bool": {
"must": {
"bool": {
"must": [],
"should": [],
"must_not": []
}
}
}
}
}
}
}
I have this query if I search it I get no results because the lasst character is single digit but if I give 3 digit it searches proper record...Is there any by default size for phrase prefix query and if it has then how to change it.
I tried giving default operator, prefix_length, max expansions etc.
contact.name has search_analyzer as "standard" and index analyzer as "autocomplete"(settings not available in question). Your issue is because field is converted to different tokens while indexing and searching so they are not matching.
Usually, the same analyzer should be applied at index time and at
search time, to ensure that the terms in the query are in the same
format as the terms in the inverted index.
Sometimes, though, it can make sense to use a different analyzer at
search time, such as when using the edge_ngram tokenizer for
autocomplete.
ex. with below settings
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 3
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
.....
}
Following tokens will be stored in index
GET index79/_analyze
{
"text": "FGTHSF-2124-645",
"analyzer": "autocomplete"
}
{
"tokens" : [
{
"token" : "fgt",
"start_offset" : 0,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "212",
"start_offset" : 7,
"end_offset" : 11,
"type" : "<NUM>",
"position" : 1
},
{
"token" : "645",
"start_offset" : 12,
"end_offset" : 15,
"type" : "<NUM>",
"position" : 2
}
]
}
Now when you are searching for "query": "HSF-2124-6", it will not return any document(["HSF","2124","6"]) as 6 is not present in any token. To return document you need to change "min_gram" to 1 to generate tokens of size 1 , 3 and 3(645=> 6 64 645).
If you are not using edge gram then please add autocomplete definition
Edit 1:
If you will see in your snapshot min gram size is 5 and max gram 20. Minimum token generated is of size 5 ex "Abhijeet" tokens generated for these will be "Abhij","Abhije","Abhijee","Abhjieet". So any text less that size 5 will not match ex "Abhi". In your case take text after splitting on hyphen("-") so 6 is not matching.
You need to update your index settings and make min-gram:1.
Steps.
1. POST /index80/_close
Below is just an example in your settings copy entire analysis part , update mingram and so a PUT request
PUT index80/_settings
{
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
}
}
3. POST /index80/_open
Check here on how to update settings for further info.
Note: Reducing min size will cause creation of additional tokens and will increase size of your index. Choice of min and max gram should be based on your query text size
This is my DSL:
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "testa",
"analyzer":"standard",
"type": "best_fields",
"fields": [ "name^5", "content^1" ]
}
},
"field_value_factor": {
"field": "popular",
"modifier": "log1p",
"factor": 0.1
},
"boost_mode": "sum",
"max_boost": 1.5
}
}
}
When I search a keyword like 'testa',the result only contains the keyword 'testa',What should I do to make the results contains keywords 'testa' and 'test' or 'tes'?
Thank you.
You can use ngram for partial words search, but you need to reindex your documents.
You can check the official example
You may use Fuzzy Match Query where your query will be like:
{
"query": {
"function_score": {
"query": {
"multi_match": {
"query": "testa",
"analyzer":"standard",
"fuzziness":"3",
"type": "best_fields",
"fields": [ "name^5", "content^1" ]
}
},
"field_value_factor": {
"field": "popular",
"modifier": "log1p",
"factor": 0.1
},
"boost_mode": "sum",
"max_boost": 1.5
}
}
}
Also, Simple Query String Query might help but you will have to enter your term as "tes*", which may or may not be acceptable to your use case.
I am Querying for getting aggregate data based on date_range, like below
"aggs": {
"range": {
"date_range": {
"field": "sold",
"ranges": [
{ "from": "2014-11-01", "to": "2014-11-30" },
{ "from": "2014-08-01", "to": "2014-08-31" }
]
}
}
}
using this I am getting this response
"aggregations": {
"range": {
"buckets": [
{
"key": "2014-08-01T00:00:00.000Z-2014-08-31T00:00:00.000Z",
"from": 1406851200000,
"from_as_string": "2014-08-01T00:00:00.000Z",
"to": 1409443200000,
"to_as_string": "2014-08-31T00:00:00.000Z",
"doc_count": 1
},
{
"key": "2014-11-01T00:00:00.000Z-2014-11-30T00:00:00.000Z",
"from": 1414800000000,
"from_as_string": "2014-11-01T00:00:00.000Z",
"to": 1417305600000,
"to_as_string": "2014-11-30T00:00:00.000Z",
"doc_count": 2
}
]
}
}
but instead of only doc_count, I have also required complete aggregate data that satisfy this range,
is threre any way to get this..please help
It's not clear what other fields you're looking for so I've included a couple of examples.
By nesting another aggs inside your first one, you can ask Elasticsearch to pull back additional values e.g. averages, sums, counts, min, max, stats, etc.
this example query will bring back field_count - a count of instances of myfield
and also return order_count - a sum based on a script.
"aggs": {
"range": {
"date_range": {
"field": "sold",
"ranges": [
{ "from": "2014-11-01", "to": "2014-11-30" },
{ "from": "2014-08-01", "to": "2014-08-31" }
]
}
}
},
"aggs" : {
"field_count": {"value_count" : { "field" : "myfield" } },
"order_count": {"sum" : {"script" : " doc[\"output_msgtype\"].value == \"order\" ? 1 : 0"} } }}
}
If you aren't looking for any sums, counts, averages on your data - then an aggregation isn't going to help.
I would instead run a standard query once per range. e.g.:
curl -XGET 'http://localhost:9200/test/cars/_search?pretty' -d '{
"fields" : ["price", "color", "make", "sold" ],
"query":{
"filtered": {
"query": {
"match_all" : { }
},
"filter" : {
"range": {"sold": {"gte": "2014-09-21T20:03:12.963","lte": "2014-09-24T20:03:12.963"}}}
}
}
}'
repeat this query as needed but modifying the range each time.
I have a query like so:
{
"sort": [
{
"_geo_distance": {
"geo": {
"lat": 39.802763999999996,
"lon": -105.08748399999999
},
"order": "asc",
"unit": "mi",
"mode": "min",
"distance_type": "sloppy_arc"
}
}
],
"query": {
"bool": {
"minimum_number_should_match": 0,
"should": [
{
"match": {
"name": ""
}
},
{
"match": {
"credit": true
}
}
]
}
}
}
I want my search to always return ALL results, just sorted with those which have matching flags closer to the top.
I would like the sorting priority to go something like:
searchTerm (name, a string)
flags (credit/atm/ada/etc, boolean values)
distance
How can this be achieved?
So far, the query you see above is all I've gotten. I haven't been able to figure out how to always return all results, nor how to incorporate the additional queries into the sort.
I don't believe "sort" is the answer you are looking for, actually. I believe you need a trial-and-error approach starting with a simple "bool" query where you put all your criterias (name, flags, distance). Then you give your name criteria more weight (boost) then a little bit less to your flags and even less to the distance calculation.
A "bool" "should" would be able to give you a sorted list of documents based on the _score of each and, depending on how you score each criteria, the _score is being influenced more or less.
Also, returning ALL the elements is not difficult: just add a "match_all": {} to your "bool" "should" query.
This would be a starting point, from my point of view, and, depending on your documents and your requirements (see my comment to your post about the confusion) you would need to adjust the "boost" values and test, adjust again and test again etc:
{
"query": {
"bool": {
"should": [
{ "constant_score": {
"boost": 6,
"query": {
"match": { "name": { "query": "something" } }
}
}},
{ "constant_score": {
"boost": 3,
"query": {
"match": { "credit": { "query": true } }
}
}},
{ "constant_score": {
"boost": 3,
"query": {
"match": { "atm": { "query": false } }
}
}},
{ "constant_score": {
"boost": 3,
"query": {
"match": { "ada": { "query": true } }
}
}},
{ "constant_score": {
"query": {
"function_score": {
"functions": [
{
"gauss": {
"geo": {
"origin": {
"lat": 39.802763999999996,
"lon": -105.08748399999999
},
"offset": "2km",
"scale": "3km"
}
}
}
]
}
}
}
},
{
"match_all": {}
}
]
}
}
}