Elasticsearch term suggester return stemmed results - search

why is the elasticsearch term suggester results are stemmed ?
when i do this query:
curl -XPOST 'localhost:9200/posts/_suggest' -d '{
"my-suggestion" : {
"text" : "manger",
"term" : {
"field" : "body"
}
}
}'
the expected result should be "manager" but I get back "manag":
{
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"my-suggest-1":[
{
"text":"mang",
"offset":0,
"length":6,
"options":[
{
"text":"manag",
"score":0.75,
"freq":180
},
{
"text":"mani",
"score":0.75,
"freq":6
}
]
}
]
}
EDIT
i found a solution for my problem: i added a standard analyzer to my query.
curl -XPOST 'localhost:9200/posts/_suggest' -d '{
"my-suggestion" : {
"text" : "manger",
"term" : {
"analyzer" : "standard",
"field" : "body"
}
}
}'
now the results are good:
{
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"my-suggest":[
{
"text":"mang",
"offset":0,
"length":6,
"options":[
{
"text":"manager",
"score":0.75,
"freq":180
},
{
"text":"manuel",
"score":0.75,
"freq":6
}
]
}
]
}
but i've run to another similar problem with agregations:
{
"aggs" : {
"cities" : {
"terms" : { "field" : "location" }
}
}
}
the results i get are trimmed:
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 473,
"max_score": 0.0,
"hits": []
},
"aggregations": {
"cities": {
"buckets": [{
"key": "londr",
"doc_count": 244
}, {
"key": "pari",
"doc_count": 244
}, {
"key": "tang",
"doc_count": 12
}, {
"key": "agad",
"doc_count": 8
}]
}
}
}

Terms aggregation works on "term" that are made from original text via tokenization and stemming. You need to mark field as "not_analyzed" in your index mappings to disable tokenization and stemming.
I never used suggesters, but it think that you need to disable stemming for that field, but enable tokenization. You can have two versions of field in index - one for search (tokenized and stemmed) and one for suggesters (tokenized, but non-stemmed).

Related

Aggregation on an array of objects

I have the following data in my elastic.
{
...someData,
languages: [
{
language:{_id: 1, name:"English"}
},
{
language:{_id: 2, name:"Arabic"}
}
]
}
But when I aggregate the data using this query
aggs: {
languages: {
terms: {
field: "languages.language._id.keyword",
size: 50
},
aggs: {
value: {
terms: {
field: "languages.language.name.keyword"
}
}
}
}
}
I will get the English id with 2 buckets for Arabic and English
and same for Arabic id, because technically its included there.
Is there a way to return only the count of the object I need?
Thanks
You need to define languages field as nested for applying aggregation on individual element of array.
Configured Nested field:
PUT index0
{
"mappings": {
"properties": {
"languages":{
"type": "nested"
}
}
}
}
Sample document index:
POST index0/_doc
{
"languages": [
{
"language": {
"_id": 1,
"name": "English"
}
},
{
"language": {
"_id": 2,
"name": "Arabic"
}
}
]
}
Sample Aggregation Query:
{
"size": 0,
"aggs": {
"languages": {
"nested": {
"path": "languages"
},
"aggs": {
"id": {
"terms": {
"field": "languages.language._id",
"size": 10
},
"aggs": {
"name": {
"terms": {
"field": "languages.language.name.keyword",
"size": 10
}
}
}
}
}
}
}
}
Result:
"aggregations" : {
"languages" : {
"doc_count" : 2,
"id" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : 1,
"doc_count" : 1,
"name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "English",
"doc_count" : 1
}
]
}
},
{
"key" : 2,
"doc_count" : 1,
"name" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Arabic",
"doc_count" : 1
}
]
}
}
]
}
}
}
Did you try this :
aggs: {
languages: {
terms: {
field: "languages.language._id.keyword",
size: 50
},
}
}
You do not need the other aggregation. You can access using the doc_count key

Elastic Search multi match query can't ignore special characters

I have a name field value as "abc_name" so when I search "abc_" I am getting proper results but when I search "abc_##£&-#&" still I am getting same results. I want my query to ignore this special characters that doesn't matches with my query.
My query has:
Multi_match
type as cross_fields
operator AND
I am using search_analyzer standard for my Fields
And I want this structure as it is otherwise it will affect my other Search behaviour
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
Please see the below sample which would fit your use case where I've created a custom analyzer which would fit your use case:
Sample Mapping:
PUT some_test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "custom_tokenizer",
"filter": ["lowercase", "3_5_edge_ngram"]
}
},
"tokenizer": {
"custom_tokenizer": {
"type": "pattern",
"pattern": "\\w+_+[^a-zA-Z\\d\\s_]+|\\s+". <---- Note this pattern
}
},
"filter": {
"3_5_edge_ngram": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 5
}
}
}
},
"mappings": {
"properties": {
"my_field":{
"type": "text",
"analyzer": "my_custom_analyzer"
}
}
}
}
The above mentioned pattern would simply ignore the tokens with the format like abc_$%^^##. As a result this token would not be indexed.
Note that the way the analyzer works is:
First executes tokenizer
Then applies the edge_ngram filter on the tokens generated.
You can verify by simply removing the edge_ngram filter in the above mapping to first understand what tokens are getting generated via Analyze API which would be as below:
POST some_test_index/_analyze
{
"analyzer": "my_custom_analyzer",
"text": "abc_name asda efg_!##!## 1213_adav"
}
Tokens generated:
{
"tokens" : [
{
"token" : "abc_name",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 0
},
{
"token" : "asda",
"start_offset" : 9,
"end_offset" : 13,
"type" : "word",
"position" : 1
},
{
"token" : "1213_adav",
"start_offset" : 25,
"end_offset" : 34,
"type" : "word",
"position" : 2
}
]
}
Note that the token efg_!##!## has been removed.
I've added edge_ngram fitler as you would want the search to be successful if you search with abc_ if your tokens generated via tokenizer is abc_name.
Sample Document:
POST some_test_index/_doc/1
{
"my_field": "abc_name asda efg_!##!## 1213_adav"
}
Query Request:
Use-case 1:
POST some_test_index/_search
{
"query": {
"match": {
"my_field": "abc_"
}
}
}
Use-case-2:
POST some_test_index/_search
{
"query": {
"match": {
"my_field": "efg_!##!##"
}
}
}
Responses:
Response for use-case-1:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.47992462,
"hits" : [
{
"_index" : "some_test_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.47992462,
"_source" : {
"my_field" : "abc_name asda efg_!##!## 1213_adav"
}
}
]
}
}
Response for use-case-2:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
Updated Answer:
Create your mapping as follows based on the index I've created and let me know if that works:
PUT some_test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "punctuation",
"filter": ["lowercase"]
}
},
"tokenizer": {
"punctuation": {
"type": "pattern",
"pattern": "\\w+_+[^a-zA-Z\\d\\s_]+|\\s+"
}
}
}
},
"mappings": {
"properties": {
"my_field":{
"type": "text",
"analyzer": "autocompete", <----- Assuming you have already this in setting
"search_analyzer": "my_custom_analyzer". <----- Note this
}
}
}
}
Please try and let me know if this works for all your use-cases.

How to calculate total for each token in Elasticsearch

I have a request into Elastic
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"something1 OR something2 OR something3",
"default_operator":"OR"
}
}
],
"filter":{
"range":{
"time":{
"gte":date
}
}
}
}
}
}
I wanna calculate count for each token in all documents using elastic search in one request, for example:
something1: 26 documents
something2: 12 documents
something3: 1 documents
Assuming that the tokens are not akin to enumerations (i.e. constrained set of specific values, like state names, which would make a terms aggregation your best bet with the right mapping), I think the closest thing to what you want would be to use filters aggregation:
POST your-index/_search
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"something1 OR something2 OR something3",
"default_operator":"OR"
}
}
],
"filter":{
"range":{
"time":{
"gte":date
}
}
}
}
},
"aggs": {
"token_doc_counts": {
"filters" : {
"filters" : {
"something1" : {
"bool": {
"must": { "query_string" : { "query" : "something1" } },
"filter": { "range": { "time": { "gte": date } } }
}
},
"something2" : {
"bool": {
"must": { "query_string" : { "query" : "something2" } },
"filter": { "range": { "time": { "gte": date } } }
}
},
"something3" : {
"bool": {
"must": { "query_string" : { "query" : "something3" } },
"filter": { "range": { "time": { "gte": date } } }
}
}
}
}
}
}
}
The response would look something like:
{
"took": 9,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"token_doc_counts": {
"buckets": {
"something1": {
"doc_count": 1
},
"something2": {
"doc_count": 2
},
"something3": {
"doc_count": 3
}
}
}
}
}
You can split your query into filters aggregation of three filters. For reference look here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html
What you would need to do, is to create a Copy_To field and have the mapping as shown below.
Depending on the fields that your query_string queries, you need to include some or all of the fields with copy_to field.
By default query_string searches all the fields, so you may need to specify copy_to for all the fields as shown in below mapping, where for sake of simplicity, I've created only three fields, title, field_2 and a third field content which would act as copied to field.
Mapping
PUT <your_index_name>
{
"mappings": {
"mydocs": {
"properties": {
"title": {
"type": "text",
"copy_to": "content"
},
"field_2": {
"type": "text",
"copy_to": "content"
},
"content": {
"type": "text",
"fielddata": true
}
}
}
}
}
Sample Documents
POST <your_index_name>/mydocs/1
{
"title": "something1",
"field_2": "something2"
}
POST <your_index_name>/mydocs/2
{
"title": "something2",
"field_2": "something3"
}
Query:
You'd get the required document counts for the each and every token using the below aggregation query and I've made use of Terms Aggregation:
POST <your_index_name>/_search
{
"size": 0,
"query": {
"query_string": {
"query": "something1 OR something2 OR something3"
}
},
"aggs": {
"myaggs": {
"terms": {
"field": "content",
"include" : ["something1","something2","something3"]
}
}
}
}
Query Response:
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"myaggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "something2",
"doc_count": 2
},
{
"key": "something1",
"doc_count": 1
},
{
"key": "something3",
"doc_count": 1
}
]
}
}
}
Let me know if it helps!

Elasticsearch aggrecation give me 2 results insted of one result

I want to aggregate on the brand field and is give me two results instead of one
The brands_aggs give me from this text
{name : "Brand 1"}
2 results
Brand and 1
But Why I need only Brand 1
is separate the word brand and 1 from (Brand 1)
and is give me 2 results in the aggrecation
my mappings where I want to aggregate
mapping = {
"mappings": {
"product": {
"properties": {
"categories": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"fielddata": True
}
"brand": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"fielddata": True
}
}
}
}
}
my post request
{
"query" : {
"bool": {
"must": [
{"match": { "categories": "AV8KW5Wi31qHZdVeXG4G" }}
]
}
},
"size" : 0,
"aggs" : {
"brand_aggs" : {
"terms" : { "field" : "brand" }
},
"categories_aggs" : {
"terms" : { "field" : "categories" }
}
}
}
response from the server
{
"took": 18,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"categories_aggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "av8kw5wi31qhzdvexg4g",
"doc_count": 1
},
{
"key": "av8kw61c31qhzdvexg4h",
"doc_count": 1
},
{
"key": "av8kxtch31qhzdvexg4a",
"doc_count": 1
}
]
},
"brand_aggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1", <==== I dont need this , why is give me that ??
"doc_count": 1
},
{
"key": "brand",
"doc_count": 1
}
]
},
}
}
Your mapping has property fields which is used when you want to have multiple analyzers for the same field. In your case valid name of your field is 'brand.keyword'. When you call your aggregate for just 'brand' it use default mapping defined for string.
So your query should be:
{
"query" : {
"bool": {
"must": [
{"match": { "categories": "AV8KW5Wi31qHZdVeXG4G" }}
]
}
},
"size" : 0,
"aggs" : {
"brand_aggs" : {
"terms" : { "field" : "brand.keyword" }
},
"categories_aggs" : {
"terms" : { "field" : "categories.keyword" }
}
}
}
Property field is useful when you want for example search the same property which multiple analyzers, for example:
"full_name": {
"type": "text",
"analyzer": "standard",
"boost": 1,
"fields": {
"autocomplete": {
"type": "text",
"analyzer": "ngram_analyzer"
},
"standard":{
"type": "text",
"analyzer": "standard"
}
}
},
You need to map your string as not_analyzed string, for that run the below query
PUT your_index/_mapping/your_type
{
"your_type": {
"properties": {
"brand": {
"type": "string",
"index": "analyzed",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
Don't forget to replace the your_type and your_index with your type and index values.

elasticsearch wildcard search with and operator

I am developing an application that uses elastic search, and in some case I want to make a search that according to term and locales. I am testing this on localhost
http://localhost:9200/index/type/_search
and parameters
query : {
wildcard : { "term" : "myterm*" }
},
filter : {
and : [
{
term : { "lang" : "en" }
},
{
term : { "translations.lang" : "tr" } //this is subdocument search
},
]
}
Here is an example document:
{
"_index": "twitter",
"_type": "tweet",
"_id": "5084151d2c6e5d5b11000008",
"_score": null,
"_source": {
"lang": "en",
"term": "photograph",
"translations": [
{
"_id": "5084151d2c6e5d5b11000009",
"lang": "tr",
"translation": "fotoğraf",
"score": "0",
"createDate": "2012-10-21T15:30:37.994Z",
"author": "anonymous"
},
{
"_id": "50850346532b865c2000000a",
"lang": "tr",
"translation": "resim",
"score": "0",
"createDate": "2012-10-22T08:26:46.670Z",
"author": "anonymous"
}
],
"author": "anonymous",
"createDate": "2012-10-21T15:30:37.994Z"
}
}
I am trying to get terms with wildcard(for autocomplete) with input language "en", and output language "tr". It is getting terms that has "myterm" but doesnt apply, and operation on this. Any suggestion would be appreciated
Thanks in advance
I would guess that the translations element has nested type. If this is the case, you should use nested query:
curl -XPOST "http://localhost:9200/twitter/tweet/_search" -d '{
query: {
wildcard: {
"term": "term*"
}
},
filter: {
and: [{
term: {
"lang": "en"
}
}, {
"nested": {
"path": "translations",
"query": {
"term" : { "translations.lang" : "tr" }
}
}
}]
}
}'
I have manage to solve my problem with following query;
query : {
wildcard : { "term" : "myterm*" }
},
filter : {
and : [
{
term : { "lang" : "en" }
},
{
term : { "translations.lang" : "tr" } //this is subdocument search
}
]
},
sort : {
{"term" : "desc"}
}
An important point here is, you need to set your sorting field as not_analyzed. Since, you cannot sort a field that is analyzed.

Resources