Startswith exact word match in elasticsearch? - search

I have an index containing field title having data as below.
jam bread
jamun
jamaica country
So If user searches for jam, I don't want jamun and jamaica country also come in search result. Right now I am using prefix query in elasticsearch, but it is not giving me result as I want.
{
"query": {
"prefix" : { "title" : "jam" }
}
}

You will get both the results as prefix query actually runs a regexp query (keyword*) on the inverted index so both the results will match.
you can do something like the following and use term query instead of the prefix query to do the exact match on the tokenized keyword.
PUT exact_index1
{
"mappings": {
"document_type" : {
"properties": {
"title" : {
"type": "text"
}
}
}
}
}
POST exact_index1/document_type
{
"title" : "jamun"
}
POST exact_index1/_search
{
"query": {
"term": {
"title": {
"value": "jam"
}
}
}
}
Hope this helps

The completion suggester provides search-as-you-type functionality
PUT - index_name/document_type/_mapping
{
"document_type": {
"properties": {
"title": {
"type": "text"
},
"suggest": {
"type": "completion",
"analyzer": "simple",
"search_analyzer": "simple"
}
}
}
}
POST - index_name/document_type
{
"name": "jamun",
"suggest":
{
"input": "jamun"
},
"output": "jamun"
}
POST - index_name/document_type/_suggest?pretty
{"type-suggest":{"text":"jam","completion":{"field":"suggest"}}}

Related

Elasticsearch query to match on one nested object, but return another

I have an ElasticSearch mapping as follows:
{
"person": {
"properties": {
"accounts": {
"type": "nested",
"properties": {
"username": {
"type": "string"
}
"account_type": {
"type": "string"
}
}
}
}
}
}
I want to write a query that matches the username for at least one of the nested accounts, but only returns the inner hit for the account that matches a specific type.
For example, for a person with accounts
{"accounts": [{"username": "Foo", "type": "foo-type"},
{"username": "Bar", "type": "bar-type"}]}}
I want a query that when searching for username "fo*", and type "bar-type", will return the user with an inner hit containing the nested object {"username": "Bar", "type": "bar-type"}, because the user has at least one account that matches the username, but the "bar-type" account type is always the one returned.
My query so far, looks like:
{
"query": {
"filtered": {
"query": {
"nested": {
"inner_hits": {
"size": 1
}
"path": "accounts",
"query": {
"bool": {
"must": [
{
"query": {
"wildcard": {
"accounts.username": {
"value": "fo*"
}
}
}
]
}
}
}
}
}
}
}
}
but (for obvious reasons) this returns the inner hit that matches the query. I'm not sure how to amend the query to return a different nested object that matches on the type "bar-type" as specified.

How to make elastic search only match full field

I have a query looking like this:
((company_id:1) AND (candidate_tags:"designer"))
However this also matches users where candidate_tags is interaction designer. How do I exclude these?
Here's my full search body:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query":
"((company_id:1) AND (candidate_tags:\"designer\"))"
}
}
}
}
"sort":{
"candidate_rating":{
"order":"desc"
},
"candidate_tags",
"_score"
}
}
Extra info
Realised now that an answer came in: candidate_tags is an array of strings, and say, a candidate has the tags interaction designer and talent, searching for talent should be a match but designer should not.
Make your candidate_tags field as not_analyzed or analyzed with keyword analyzer.
{
"mappings": {
"test": {
"properties": {
"candidate_tags": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
or add a raw field to your existent mapping like this:
{
"mappings": {
"test": {
"properties": {
"candidate_tags": {
"type": "string",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
}
For the first option use the same query as you use now.
For the second option use candidate_tags.raw, like this:
{
"query": {
"filtered": {
"query": {
"query_string": {
"query": "((company_id:1) AND (candidate_tags.raw:\"designer\"))"
}
}
}
}
...
Another way is to use script:
POST test/t/1
{
"q":"a b"
}
POST test/t/2
{
"q":"a c"
}
POST test/t/_search
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "return _source.q=='a b'"
}
}
}
}
}
By filtering.
By making field candidate_tags an exact-value field - aka not_analyzed field (Andrei Stefan's solution, answered above)
With #2 be careful that you don't later mix the field that is not_analyzed with those that are. More: https://www.elastic.co/guide/en/elasticsearch/guide/current/_exact_value_fields.html
With #1, your query would look something like that (written from memory, don't have ES on me so can't verify):
{
"query": {
"filtered": {
"query": {
"query_string": {
"query":
"((company_id:1) AND (candidate_tags:\"designer\"))"
}
},
"filter" : {
"term" : {
"candidate_tags" : "designer"
}
}
}
}
"sort":{
"candidate_rating":{
"order":"desc"
},
"candidate_tags",
"_score"
}
}

Elasticsearch lowercase filter search

I'm trying to search my database and be able to use upper/lower case filter terms but I've noticed while query's apply analyzers, I can't figure out how to apply a lowercase analyzer on a filtered search. Here's the query:
{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"term": {
"language": "mandarin" // Returns a doc
}
},
{
"term": {
"language": "Italian" // Does NOT return a doc, but will if lowercased
}
}
]
}
}
}
}
}
I have a type languages that I have lowercased using:
"analyzer": {
"lower_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
and a corresponding mapping:
"mappings": {
"languages": {
"_id": {
"path": "languageID"
},
"properties": {
"languageID": {
"type": "integer"
},
"language": {
"type": "string",
"analyzer": "lower_keyword"
},
"native": {
"type": "string",
"analyzer": "keyword"
},
"meta": {
"type": "nested"
},
"language_suggest": {
"type": "completion"
}
}
}
}
The problem is that you have a field that you have analyzed during index to lowercase it, but you are using a term filter for the query which is not analyzed:
Term Filter
Filters documents that have fields that contain a term (not analyzed).
Similar to term query, except that it acts as a filter.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-filter.html
I'd try using a query filter instead:
Query Filter
Wraps any query to be used as a filter. Can be placed within queries
that accept a filter.
Example:
{
"constantScore" : {
"filter" : {
"query" : {
"query_string" : {
"query" : "this AND that OR thus"
}
}
}
} }
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-filter.html#query-dsl-query-filter
This may be achieved by appending .keyword to your field to query against the keyword version of the field. Assuming language was defined in the mapping with type keyword.
Note that now only the exact text would match: mandarin won't match and Italian would.
Your query would end up like this:
{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"term": {
"language.keyword": "mandarin" // Returns Empty
}
},
{
"term": {
"language.keyword": "Italian" // Returns Italian.
}
}
]
}
}
}
}
}
Combining the term values is also allowed:
{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"term": {
"language.keyword":
["mandarin", "Italian"]
}
}
]
}
}
}
}
}

Elasticsearch wildcard search on not_analyzed field

I have an index like following settings and mapping;
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"analyzer_keyword":{
"tokenizer":"keyword",
"filter":"lowercase"
}
}
}
}
},
"mappings":{
"product":{
"properties":{
"name":{
"analyzer":"analyzer_keyword",
"type":"string",
"index": "not_analyzed"
}
}
}
}
}
I am struggling with making an implementation for wildcard search on name field. My example data like this;
[
{"name": "SVF-123"},
{"name": "SVF-234"}
]
When I perform following query;
http://localhost:9200/my_index/product/_search -d '
{
"query": {
"filtered" : {
"query" : {
"query_string" : {
"query": "*SVF-1*"
}
}
}
}
}'
It returns SVF-123,SVF-234. I think, it still tokenizes data. It must return only SVF-123.
Could you please help on this?
Thanks in advance
There's a couple of things going wrong here.
First, you are saying that you don't want terms analyzed index time. Then, there's an analyzer configured (that's used search time) that generates incompatible terms. (They are lowercased)
By default, all terms end up in the _all-field with the standard analyzer. That is where you end up searching. Since it tokenizes on "-", you end up with an OR of "*SVF" and "1*".
Try to do a terms facet on _all and on name to see what's going on.
Here's a runnable Play and gist: https://www.found.no/play/gist/3e5fcb1b4c41cfc20226 (https://gist.github.com/alexbrasetvik/3e5fcb1b4c41cfc20226)
You need to make sure the terms you index is compatible with what you search for. You probably want to disable _all, since it can muddy what's going on.
#!/bin/bash
export ELASTICSEARCH_ENDPOINT="http://localhost:9200"
# Create indexes
curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
"settings": {
"analysis": {
"text": [
"SVF-123",
"SVF-234"
],
"analyzer": {
"analyzer_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"type": {
"properties": {
"name": {
"type": "string",
"index": "not_analyzed",
"analyzer": "analyzer_keyword"
}
}
}
}
}'
# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"name":"SVF-123"}
{"index":{"_index":"play","_type":"type"}}
{"name":"SVF-234"}
'
# Do searches
# See all the generated terms.
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"facets": {
"name": {
"terms": {
"field": "name"
}
},
"_all": {
"terms": {
"field": "_all"
}
}
}
}
'
# Analyzed, so no match
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"match": {
"name": {
"query": "SVF-123"
}
}
}
}
'
# Not analyzed according to `analyzer_keyword`, so matches. (Note: term, not match)
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"term": {
"name": {
"value": "SVF-123"
}
}
}
}
'
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"term": {
"_all": {
"value": "svf"
}
}
}
}
'
My solution adventure
I have started my case as you can see in my question. Whenever, I have changed a part of my settings, one part started to work, but another part stop working. Let me give my solution history:
1.) I have indexed my data as default. This means, my data is analyzed as default. This will cause problem on my side. For example;
When user started to search a keyword like SVF-1, system run this query:
{
"query": {
"filtered" : {
"query" : {
"query_string" : {
"analyze_wildcard": true,
"query": "*SVF-1*"
}
}
}
}
}
and results;
SVF-123
SVF-234
This is normal, because name field of my documents are analyzed. This splits query into tokens SVF and 1, and SVF matches my documents, although 1 does not match. I have skipped this way. I have create a mapping for my fields make them not_analyzed
{
"mappings":{
"product":{
"properties":{
"name":{
"type":"string",
"index": "not_analyzed"
},
"site":{
"type":"string",
"index": "not_analyzed"
}
}
}
}
}
but my problem continued.
2.) I wanted to try another way after lots of research. Decided to use wildcard query.
My query is;
{
"query": {
"wildcard" : {
"name" : {
"value" : *SVF-1*"
}
}
},
"filter":{
"term": {"site":"pro_en_GB"}
}
}
}
This query worked, but one problem here. My fields are not_analyzed anymore, and I am making wildcard query. Case sensitivity is problem here. If I search like svf-1, it returns nothing. Since, user can input lowercase version of query.
3.) I have changed my document structure to;
{
"mappings":{
"product":{
"properties":{
"name":{
"type":"string",
"index": "not_analyzed"
},
"nameLowerCase":{
"type":"string",
"index": "not_analyzed"
}
"site":{
"type":"string",
"index": "not_analyzed"
}
}
}
}
}
I have adde one more field for name called nameLowerCase. When I am indexing my document, I am setting my document like;
{
name: "SVF-123",
nameLowerCase: "svf-123",
site: "pro_en_GB"
}
Here, I am converting query keyword to lowercase and make search operation on new nameLowerCase index. And displaying name field.
Final version of my query is;
{
"query": {
"wildcard" : {
"nameLowerCase" : {
"value" : "*svf-1*"
}
}
},
"filter":{
"term": {"site":"pro_en_GB"}
}
}
}
Now it works. There is also one way to solve this problem by using multi_field. My query contains dash(-), and faced some problems.
Lots of thanks to #Alex Brasetvik for his detailed explanation and effort
Adding to Hüseyin answer, we can use AND as the default operator. So SVF and 1* will be joined using AND operator, therefore giving us the correct results.
"query": {
"filtered" : {
"query" : {
"query_string" : {
"default_operator": "AND",
"analyze_wildcard": true,
"query": "*SVF-1*"
}
}
}
}
#Viduranga Wijesooriya as you stated "default_operator" : "AND" will check for presence of both SVF and 1 but exact match alone is still not possible,
but ya this will filter the results in more appropriate way leaving with all combination of SVF and 1 and sorting the results by relevance which will promote SVF-1 up the order
For pulling out the exact result
"settings": {
"analysis": {
"analyzer": {
"analyzer_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"type": {
"properties": {
"name": {
"type": "string",
"analyzer": "analyzer_keyword"
}
}
}
}
and the query is
{
"query": {
"bool": {
"must": [
{
"query_string" : {
"fields": ["name"],
"query" : "*svf-1*",
"analyze_wildcard": true
}
}
]
}
}
}
result
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "play",
"_type": "type",
"_id": "AVfXzn3oIKphDu1OoMtF",
"_score": 1,
"_source": {
"name": "SVF-123"
}
}
]
}
}

Boosting matched documents in Elasticsearch which have a certain tag

I have an index of documents that look this:
{
url: "/foo/bar",
html_blocks: [
"<h1>hi</h1>"
],
tags: [
"video",
"text"
],
title: "My title"
}
I'd like to query these documents on the title and html_blocks fields, and for any matches add a boost if they have a video tag.
So far, my query looks like this:
{
"query": {
"query_string": {
"query": "foo",
"fields": [
"title",
"html_blocks"
]
}
}
}
How do I modify it so that it continues to only return results if a match is found in the existing query, but a boost is added to any of the results which have a video tag? Thanks!
You want a custom_filters_score which will just boost on matches. Note that filter input is not analyzed, so you might wrap that in a query if you need it analyzed. Your other options to boost, while not really for this case are the boosting query, which is good for demoting results and the custom_score_query which is good for added boosts based on some calculated value.
See: Custom_filters_score
{
"query": {
"custom_filters_score": {
"query": {
"query_string": {
"query": "foo",
"fields": [
"title",
"html_blocks"
]
}
},
"filters": [
{
"filter": {
"term": {
"tags": "video"
}
},
"boost": 3
}
]
}
}
}
Edit:
This is what I mean by wrapping in a query using a filter query. Trust me, once you get the hang of ES, you'll be nested so knee deep that you'll produce some of the most satisfying queries ever.
{
"query": {
"custom_filters_score": {
"query": {
"query_string": {
"query": "foo",
"fields": [
"title",
"html_blocks"
]
}
},
"filters": [
{
"filter": {
//here comes the filter query, and I changed term to match
//since match analyzes
"query":{
"match": {
"tags": "video"
}
}
},
"boost": 3
}
]
}
}
}

Resources