Searching for names(text) with spaces in it, causing problem to me,
I have mapping similar to
"{"user":{"properties":{"name":{"type":"string"}}}}"
Ideally what it should return and rank results as follows
1) Bring on top names that exact match the search term (highest score)
2) Names that starts with the search term (high score)
3) Names that contains the exact search term as substring (medium score)
4) Names that contains any of the search term token (lowest score)
Example
For following names in elasticsearch
Maaz Tariq
Ahmed Maaz Tariq
Maaz Sheeba
Maaz Bin Tariq
Sana Tariq
Maaz Tariq Ahmed
Searching for "Maaz Tariq" , Results should be in following order
Maaz Tariq (highest score)
Maaz Tariq Ahmed (high score)
Ahmed Maaz Tariq (medium score)
Maaz Bin Tariq (lowest score)
Maaz Sheeba (lowest score)
Sana Tariq (lowest score)
Can any one point me how and which analyzers to use? and how to rank the search results for names?
You can use the multi field type, a bool query and the custom boost factor query to solve this problem.
Mapping:
{
"mappings" : {
"user" : {
"properties" : {
"name": {
"type": "multi_field",
"fields": {
"name": { "type" : "string", "index": "analyzed" },
"exact": { "type" : "string", "index": "not_analyzed" }
}
}
}
}
}
}
Query:
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "Maaz Tariq"
}
}
],
"should": [
{
"custom_boost_factor": {
"query": {
"term": {
"name.exact": "Maaz Tariq"
}
},
"boost_factor": 15
}
},
{
"custom_boost_factor": {
"query": {
"prefix": {
"name.exact": "Maaz Tariq"
}
},
"boost_factor": 10
}
},
{
"custom_boost_factor": {
"query": {
"match_phrase": {
"name": {
"query": "Maaz Tariq",
"slop": 0
}
}
},
"boost_factor": 5
}
}
]
}
}
}
edit:
As pointed out by javanna, the custom_boost_factor isn't needed.
Query without custom_boost_factor:
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "Maaz Tariq"
}
}
],
"should": [
{
"term": {
"name.exact": {
"value": "Maaz Tariq",
"boost": 15
}
}
},
{
"prefix": {
"name.exact": {
"value": "Maaz Tariq",
"boost": 10
}
}
},
{
"match_phrase": {
"name": {
"query": "Maaz Tariq",
"slop": 0,
"boost": 5
}
}
}
]
}
}
}
In case of Java Api, when quering exact strings with spaces use;
CLIENT.prepareSearch(index)
.setQuery(QueryBuilders.queryStringQuery(wordString)
.field(fieldName));
In a lot of other queries, you get nothing as result
And from Elasticsearch 1.0:
"title": {
"type": "multi_field",
"fields": {
"title": { "type": "string" },
"raw": { "type": "string", "index": "not_analyzed" }
}
}
became:
"title": {
"type": "string",
"fields": {
"raw": { "type": "string", "index": "not_analyzed" }
}
}
https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html
Related
Imagine I have a movie document, and its ratings is modelled as nested fields:
"mappings": {
"movie": {
"properties": {
"name": {"type": "text"}
"ratings": {
"type": "nested"
"properties": {
"userId": {"type": "keyword"},
"rating": {"type": "integer"}
}
}
}
}
}
What I want to do is: for a given movie name, and a list of users' ids. I want to find the movie and lowest rating among these users. I managed to construct a query to do the job
{
"query": {
"bool": {
"must": [{
"match": {
"name": "fake movie name"
}
}],
"filter": {
"nested": {
"path": "ratings",
"query": {
"bool": {
"must": {
"match": {
"ratings.userId": ["user1", "user2"]
}
}
}
}
}
}
},
"aggs": {
"userIdFilter": {
"filter": {
"terms": {
"ratings.userId": ["user1", "user2"]
}
},
"aggs": {
"lowestRating": {
"min": {
"field": "ratings.rating"
}
}
}
}
}
}
}
Is possible to add filter on the lowest rating, only returns document's lowest rating is lower certain value?
I hope there is a way to approach this without using script, I tried bucket-selector-aggregation, but cannot get a working version. Any ideas?
Thank you
I have a type called jobdetails. The type contain professional experience related details of employees. Each employee document have an experience filed which is type nested.
"experience":
{
"type": "nested",
"properties":
{
"company": {
"type": "string"
},
"title":{
"type": "string"
}
}
}
I would like to know how to fetch employees having only “manager” or “teacher” but not “trainee” experience in their experience field.
For Ex:
doc 1: experience[
{“company”:“xxx”, “title”:”manager”},
{“company”:“xxx”, “title”:”teacher”},
{“company”:“xxx”, “title”:”trainee manager”},]
doc 2: experience[{“company”:“xxx”, “title”:”manager”}]
doc 3: experience[{“company”:“xxx”, “title”:”teacher”}]
doc 4: experience[
{“company”:“xxx”, “title”:”manager”},
{“company”:“xxx”, “title”:”teacher]
The required query should return doc2, doc3, doc4 but not doc1.
A query like the following one should do the trick, i.e. we're looking for documents whose experience.title field contains either manager or teacher but not trainee
{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"nested": {
"path": "experience",
"filter": {
"terms": {
"experience.title": [
"manager",
"teacher"
]
}
}
}
}
],
"must_not": [
{
"nested": {
"path": "experience",
"filter": {
"terms": {
"experience.title": [
"trainee"
]
}
}
}
}
]
}
}
}
}
}
I have a query like so:
{
"sort": [
{
"_geo_distance": {
"geo": {
"lat": 39.802763999999996,
"lon": -105.08748399999999
},
"order": "asc",
"unit": "mi",
"mode": "min",
"distance_type": "sloppy_arc"
}
}
],
"query": {
"bool": {
"minimum_number_should_match": 0,
"should": [
{
"match": {
"name": ""
}
},
{
"match": {
"credit": true
}
}
]
}
}
}
I want my search to always return ALL results, just sorted with those which have matching flags closer to the top.
I would like the sorting priority to go something like:
searchTerm (name, a string)
flags (credit/atm/ada/etc, boolean values)
distance
How can this be achieved?
So far, the query you see above is all I've gotten. I haven't been able to figure out how to always return all results, nor how to incorporate the additional queries into the sort.
I don't believe "sort" is the answer you are looking for, actually. I believe you need a trial-and-error approach starting with a simple "bool" query where you put all your criterias (name, flags, distance). Then you give your name criteria more weight (boost) then a little bit less to your flags and even less to the distance calculation.
A "bool" "should" would be able to give you a sorted list of documents based on the _score of each and, depending on how you score each criteria, the _score is being influenced more or less.
Also, returning ALL the elements is not difficult: just add a "match_all": {} to your "bool" "should" query.
This would be a starting point, from my point of view, and, depending on your documents and your requirements (see my comment to your post about the confusion) you would need to adjust the "boost" values and test, adjust again and test again etc:
{
"query": {
"bool": {
"should": [
{ "constant_score": {
"boost": 6,
"query": {
"match": { "name": { "query": "something" } }
}
}},
{ "constant_score": {
"boost": 3,
"query": {
"match": { "credit": { "query": true } }
}
}},
{ "constant_score": {
"boost": 3,
"query": {
"match": { "atm": { "query": false } }
}
}},
{ "constant_score": {
"boost": 3,
"query": {
"match": { "ada": { "query": true } }
}
}},
{ "constant_score": {
"query": {
"function_score": {
"functions": [
{
"gauss": {
"geo": {
"origin": {
"lat": 39.802763999999996,
"lon": -105.08748399999999
},
"offset": "2km",
"scale": "3km"
}
}
}
]
}
}
}
},
{
"match_all": {}
}
]
}
}
}
I'm trying to search my database and be able to use upper/lower case filter terms but I've noticed while query's apply analyzers, I can't figure out how to apply a lowercase analyzer on a filtered search. Here's the query:
{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"term": {
"language": "mandarin" // Returns a doc
}
},
{
"term": {
"language": "Italian" // Does NOT return a doc, but will if lowercased
}
}
]
}
}
}
}
}
I have a type languages that I have lowercased using:
"analyzer": {
"lower_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
and a corresponding mapping:
"mappings": {
"languages": {
"_id": {
"path": "languageID"
},
"properties": {
"languageID": {
"type": "integer"
},
"language": {
"type": "string",
"analyzer": "lower_keyword"
},
"native": {
"type": "string",
"analyzer": "keyword"
},
"meta": {
"type": "nested"
},
"language_suggest": {
"type": "completion"
}
}
}
}
The problem is that you have a field that you have analyzed during index to lowercase it, but you are using a term filter for the query which is not analyzed:
Term Filter
Filters documents that have fields that contain a term (not analyzed).
Similar to term query, except that it acts as a filter.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-filter.html
I'd try using a query filter instead:
Query Filter
Wraps any query to be used as a filter. Can be placed within queries
that accept a filter.
Example:
{
"constantScore" : {
"filter" : {
"query" : {
"query_string" : {
"query" : "this AND that OR thus"
}
}
}
} }
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-filter.html#query-dsl-query-filter
This may be achieved by appending .keyword to your field to query against the keyword version of the field. Assuming language was defined in the mapping with type keyword.
Note that now only the exact text would match: mandarin won't match and Italian would.
Your query would end up like this:
{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"term": {
"language.keyword": "mandarin" // Returns Empty
}
},
{
"term": {
"language.keyword": "Italian" // Returns Italian.
}
}
]
}
}
}
}
}
Combining the term values is also allowed:
{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"term": {
"language.keyword":
["mandarin", "Italian"]
}
}
]
}
}
}
}
}
I have the following mapping:
curl -XPUT 'http://localhost:9200/bookstore/user/_mapping' -d '
{
"user": {
"properties": {
"user_id": { "type": "integer" },
"gender": { "type": "string", "index" : "not_analyzed" },
"age": { "type": "integer" },
"age_bracket": { "type": "string", "index" : "not_analyzed" },
"current_city": { "type": "string", "index" : "not_analyzed" },
"relationship_status": { "type": "string", "index" : "not_analyzed" },
"books" : {
"type": "nested",
"properties" : {
"b_oid": { "type": "string", "index" : "not_analyzed" },
"b_name": { "type": "string", "index" : "not_analyzed" },
"bc_id": { "type": "integer" },
"bc_name": { "type": "string", "index" : "not_analyzed" },
"bcl_name": { "type": "string", "index" : "not_analyzed" },
"b_id": { "type": "integer" }
}
}
}
}
}'
Now, I try to query for example for Users which have "gender": "Male", have bought book in a certain category "bcl_name": "Trivia" and show the "b_name" book titles. I somehow cannot get it to run.
I have the query
curl -XGET 'http://localhost:9200/bookstore/user/_search?pretty=1' -d '{
"size": 0,
"from": 0,
"query": {
"filtered": {
"query": {
"terms": {
"gender": [
"Male"
]
}
}
}
},
"facets": {
"CategoryFacet": {
"terms": {
"field": "books.b_name",
"size": 5,
"shard_size": 1000,
"order": "count"
},
"nested": "books",
"facet_filter": {
"terms": {
"books.bcl_name": [
"Trivia"
]
}
}
}
}
}'
which returns a result, but I'm not sure whether this is correct. I looked for some examples, and found this (http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/) for example. I'm able to rewrite my query like this:
curl -XGET 'http://localhost:9200/bookstore/user/_search?pretty=1' -d '{
"size": 0,
"from": 0,
"query": {
"filtered": {
"query": {
"terms": {
"gender": [
"Male"
]
}
},
"filter": {
"nested": {
"path": "books",
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"term": {
"books.bcl_name": "Trivia"
}
}
]
}
}
}
}
}
}
},
"facets": {
"CategoryFacet": {
"terms": {
"field": "books.b_name",
"size": 5,
"shard_size": 1000,
"order": "count"
},
"nested": "books"
}
}
}'
which shows different results.
I, as a beginner, am a litte lost right now. Can someone please give me hint on how to solve this`? Thanks a lot in advance!
First query means:
Search for users whose gender : "Male"
But "CategoryFacet" includes the count of gender : "Male" AND
books.bcl_name : "Trivia"
So in result set you get all "Male" users, but your CategoryFacet gives you the count of "Male users AND whose books.bcl_name is Trivia".
In second query your "CategoryFacet" does not include extra filtering. It just returns the facets from the exact result set.