Why are my synonyms returning nothing? - search

I am new to Elasticsearch, and right now I am trying to figure out why my synonyms are not returning any results like I expect them to.
I created a custom filter and analyzer for my synonyms file and applied the analyzer to both the _all field and explicitly defined the specialty field to use it as well.
When I search for "specialty": "aids" without the analyzer/tokenizer, it gives me zero results as expected.
However, when I search for "specialty": "aids" with the analyzer/tokenizer, I expect it to give me the same results as searching for "speciality": "retrovirology", which should yields 3 results, but it comes back with nothing.
Is there something wrong with how I am approaching this?
Here are my settings and some sample data:
curl -XDELETE "http://localhost:9200/personsearch"
curl -XPUT "http://localhost:9200/personsearch" -d'
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"XYZSynAnalyzer": {
"tokenizer": "standard",
"filter": [
"XYZSynFilter"
]
}
},
"filter": {
"XYZSynFilter": {
"type": "synonym",
"synonyms": [
"aids, retrovirology"
]
}
}
}
}
},
"mappings": {
"xyzemployee": {
"_all": {
"analyzer": "XYZSynAnalyzer"
},
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"middleName": {
"type": "string",
"include_in_all": false,
"index": "not_analyzed"
},
"specialty": {
"type": "string",
"analyzer": "XYZSynAnalyzer"
}
}
}
}
}'
curl -XPUT "http://localhost:9200/personsearch/xyzemployee/1" -d'
{
"firstName": "Don",
"middleName": "W.",
"lastName": "White",
"specialty": "Adult Retrovirology"
}'
curl -XPUT "http://localhost:9200/personsearch/xyzemployee/2" -d'
{
"firstName": "Terrance",
"middleName": "G.",
"lastName": "Gartner",
"specialty": "Retrovirology"
}'
curl -XPUT "http://localhost:9200/personsearch/xyzemployee/3" -d'
{
"firstName": "Carter",
"middleName": "L.",
"lastName": "Taylor",
"specialty": "Pediatric Retrovirology"
}'
# Why is this returning nothing?
curl -XGET "http://localhost:9200/personsearch/xyzemployee/_search?pretty=true" -d'
{
"query": {
"match": {
"specialty": "retrovirology"
}
}
}'

You aren't lowercasing anywhere.
Try this:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"XYZSynAnalyzer": {
"tokenizer": "standard",
"filter": [
"lowercase", "XYZSynFilter"
]
}
},
"filter": {
"XYZSynFilter": {
"type": "synonym",
"synonyms": [
"aids, retrovirology"
]
}
}
}
}
}
Note: you may want to split your index analyzer and search analyzer, and choose only one of them to do the synonyms. Expanding them only during indexing will speed search results.

Related

Querying nested objects

Just to let things clear, first day working with Elastic... Moving to the problem.
I started to create my index with
curl -XPUT "http://localhost:9200/users" -d'
{
"mappings": {
"user": {
"properties": {
"education": {
"type": "nested"
},
"job": {
"type": "nested"
}
}
}
}
}'
and then
curl -XPOST "http://localhost:9200/users/user/" -d'
{
"name": "User A",
"education": [
{
"school": "School A1",
"course": "Course A1"
},
{
"school": "School A2",
"course": "Course A2"
}
]
}'
The problem that I'm facing now is the query part. I'm trying to get results with:
curl -XPOST "http://localhost:9200/users/user/_search?pretty" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "education",
"filter": {
"bool": {
"must": [
{
"term": {
"education.school": "School A1"
}
}
]
}
}
}
}
}
}
}'
But nothing is getting returned.
As per the mappings provided by you, school field is analyzed.
Analyzed means the text School A will split over space and will be tokenized as School and A.
you are searching using term query which looks for exact term. Study here about term query.
You can use Query_string with default_operator as AND
curl -XPOST "http://localhost:9200/users/user/_search?pretty" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "education",
"filter": {
"bool": {
"must": [
{
"query": {
"query_string": {
"default_field": "education.school",
"query": "School A1",
"default_operator": "AND"
}
}
}
]
}
}
}
}
}
}
}'
Just leaving my 2 cents here. I would avoid using filtered query as it is being deprecated Check this in latest release of ES.
I'll just rewrite the above query without using filtered query
curl -XPOST "http://localhost:9200/users/user/_search?pretty" -d'
{
"query": {
"nested": {
"path": "education",
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "education.school",
"query": "School A1",
"default_operator": "AND"
}
}
]
}
}
}
}
}'
I followed this doc to write above query.

ElasticSearch query stops working with big amount of data

The problem: I have 2 identical in terms of settings and mappings indexes.
The first index contains only 1 document.
The second index contains the same document + 16M of others.
When I'm running the query on the first index it returns the document, but when I do the same query on the second — I receive nothing.
Indexes settings:
{
"tasks_test": {
"settings": {
"index": {
"analysis": {
"analyzer": {
"tag_analyzer": {
"filter": [
"lowercase",
"tag_filter"
],
"tokenizer": "whitespace",
"type": "custom"
}
},
"filter": {
"tag_filter": {
"type": "word_delimiter",
"type_table": "# => ALPHA"
}
}
},
"creation_date": "1444127141035",
"number_of_replicas": "2",
"number_of_shards": "5",
"uuid": "wTe6WVtLRTq0XwmaLb7BLg",
"version": {
"created": "1050199"
}
}
}
}
}
Mappings:
{
"tasks_test": {
"mappings": {
"Task": {
"dynamic": "false",
"properties": {
"format": "dateOptionalTime",
"include_in_all": false,
"type": "date"
},
"is_private": {
"type": "boolean"
},
"last_timestamp": {
"type": "integer"
},
"name": {
"analyzer": "tag_analyzer",
"type": "string"
},
"project_id": {
"include_in_all": false,
"type": "integer"
},
"user_id": {
"include_in_all": false,
"type": "integer"
}
}
}
}
}
The document:
{
"_index": "tasks_test",
"_type": "Task",
"_id": "1",
"_source": {
"is_private": false,
"name": "135548- test with number",
"project_id": 2,
"user_id": 1
}
}
The query:
{
"query": {
"filtered": {
"query": {
"bool": {
"must": [
[
{
"match": {
"_all": {
"query": "135548",
"type": "phrase_prefix"
}
}
}
]
]
}
},
"filter": {
"bool": {
"must": [
{
"term": {
"is_private": false
}
},
{
"terms": {
"project_id": [
2
]
}
},
{
"terms": {
"user_id": [
1
]
}
}
]
}
}
}
}
}
Also, some findings:
if I replace _all with name everything works
if I replace match_phrase_prefix with match_phrase works too
ES version: 1.5.1
So, the question is: how to make the query work for the second index without mentioned hacks?

Elasticsearch use custom analyzer on filter

I was asking on elasticsearch nested filter return empty result about some error I have in the query and wont getting any results, but in the answer I was pointed out that the expression I use for the filter wasn't analyzed as I expect.
I have a custom analyzer to do the work how can I specify in the next query to the filter to use this custom analyzer:
GET /develop/_search?search_type=dfs_query_then_fetch
{
"query": {
"filtered" : {
"query": {
"bool": {
"must": [
{ "match": { "title": "post" }}
]
}
},
"filter": {
"bool": {
"must": [
{"term": {
"featured": 0
}},
{
"nested": {
"path": "seller",
"filter": {
"bool": {
"must": [
{ "term": { "seller.firstName": "Test 3" } }
]
}
},
"_cache" : true
}}
]
}
}
}
},
"sort": [
{
"_score":{
"order": "desc"
}
},{
"created": {
"order": "desc"
}
}
],
"track_scores": true
}
Here is a setup that seems to do what you want. I used the same basic code as the last answer, but used index_analyzer and search_analyzer in the index definition as follows:
curl -XDELETE "http://localhost:9200/my_index"
curl -XPUT "http://localhost:9200/my_index" -d'
{
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0,
"analysis": {
"filter": {
"snowball": { "type": "snowball", "language": "English" },
"english_stemmer": { "type": "stemmer", "language": "english" },
"english_possessive_stemmer": { "type": "stemmer", "language": "possessive_english" },
"stopwords": { "type": "stop", "stopwords": [ "_english_" ] },
"worddelimiter": { "type": "word_delimiter" }
},
"tokenizer": {
"nGram": { "type": "nGram", "min_gram": 3, "max_gram": 20 }
},
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "nGram",
"filter": [
"stopwords",
"asciifolding",
"lowercase",
"snowball",
"english_stemmer",
"english_possessive_stemmer",
"worddelimiter"
]
},
"custom_search_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"stopwords",
"asciifolding",
"lowercase",
"snowball",
"english_stemmer",
"english_possessive_stemmer",
"worddelimiter"
]
}
}
}
},
"mappings": {
"posts": {
"properties": {
"title": {
"type": "string",
"analyzer": "custom_analyzer",
"boost": 5
},
"seller": {
"type": "nested",
"properties": {
"firstName": {
"type": "string",
"index_analyzer": "custom_analyzer",
"search_analyzer": "custom_search_analyzer",
"boost": 3
}
}
}
}
}
}
}'
Then added the test docs
curl -XPUT "http://localhost:9200/my_index/posts/1" -d'
{"title": "post", "seller": {"firstName":"Test 1"}}'
curl -XPUT "http://localhost:9200/my_index/posts/2" -d'
{"title": "post", "seller": {"firstName":"Test 2"}}'
curl -XPUT "http://localhost:9200/my_index/posts/3" -d'
{"title": "post", "seller": {"firstName":"Test 3"}}'
And then a couple of match queries in a bool, where one is a multiword query, seems to accomplish what you are wanting:
curl -XPOST "http://localhost:9200/my_index/_search" -d'
{
"query": {
"bool": {
"must": [
{
"match": {
"title": "post"
}
},
{
"nested": {
"path": "seller",
"query": {
"match": {
"seller.firstName": {
"query": "Test 3",
"operator": "and"
}
}
}
}
}
]
}
}
}'
...
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 6.8380365,
"hits": [
{
"_index": "my_index",
"_type": "posts",
"_id": "3",
"_score": 6.8380365,
"_source": {
"title": "post",
"seller": {
"firstName": "Test 3"
}
}
}
]
}
}
Here is the code I used:
http://sense.qbox.io/gist/8cd954aa60be8c44f64e4282e15e6b565c945ecb
Does that solve your problem?

how to do exact match in elasticsearch?

Here I have given my updated mapping
curl -X PUT localhost:9200/testing/listings/_mapping -d '{
"listings" : {
"properties" : {
"address" : {
"properties": {
"location": { "type" : "string",
"index" : "not_analyzed"
}
}
},
"suggest" : { "type" : "completion",
"index_analyzer" : "simple",
"search_analyzer" : "simple",
"payloads" : true
}
}
}
}'
my mapping is created index as follows
{
"testing": {
"mappings": {
"listings": {
"properties": {
"address": {
"properties": {
"city": {
"type": "string"
},
"line1": {
"type": "string"
},
"line2": {
"type": "string"
},
"line3": {
"type": "string"
},
"location": {
"type": "string",
"index": "not_analyzed"
},
"pincode": {
"type": "string"
}
}
},
"title": {
"type": "string"
}
}
}
}
}
}
but still my data is not matching.
my sample data is
{
"listings": {
"title": "testing 3",
"address": {
"line1": "3rd cross",
"line2": "6th main",
"line3": "",
"landmark": "",
"location": "k r puram",
"pincode": "",
"city": "Bangalore"
}
}
}
when I give the query as k r puram I am getting the matched results.
But when I am giving the query as r r puram or r k puram that time also I am getting the results which is belongs to k r puram.
In above query I am having listings only for k r puram others I don't have listings so other than k r puram it should give the empty results.
this is my query:
{
"query": {
"bool": {
"must": [
{
"match": {
"published": true
}
},
{
"match": {
"inActive": false
}
},
{
"range": {
"propertyDetailsCategory.build_up_area": {
"lte": 200
}
}
},
{
"match": {
"type": "commercial"
}
},
{
"match": {
"purpose": "rent"
}
},
{
"range": {
"commercialsCategory.exp_rent": {
"lte": 50000
}
}
},
{
"match": {
"address.location": "k r puram"
}
}
]
}
}
}
If the data is exactly "k r puram" and you're searching for exactly "k r puram" - then you shouldn't use an analyser.
When inserting data the default behaviour in Elasticsearch is to use the standard analyser.
To disable this use
"index": "not_analyzed"
in the mapping for the appropriate field.
if your mapping is as follows:
curl -XPOST http://localhost:9200/index/address/_mapping -d '
{"address": {
"properties": {
"city": {"type": "string"},
"line1": {"type": "string"},
"line2": {"type": "string"},
"line3": {"type": "string"},
"location": { "type": "string", "index": "not_analyzed"},
"pincode": {"type": "string"}
}}}'
then your data must match it, for example this doesn't match it:
curl -XPOST http://localhost:9200/index/address/ -d '
{"title":"testing",
"address":
{"line1":"#51",
"line2":"3rd cross",
"line3":"6th main",
"location":"k r puram",
"pincode":"560041"}}
This however does match (my modifications):
curl -XPOST http://localhost:9200/index/address/ -d '
{"line1":"#51",
"line2":"3rd cross",
"line3":"6th main",
"location":"k r puram",
"pincode":"560041"}'
And this query finds the document as expected:
curl -XGET http://localhost:9200/index/address/_search -d '
{
"query" :{"match" : {"location": "k r puram"}}
}'
if you can't change your data, then add the extra level to the mapping,e.g.:
curl -XPOST http://localhost:9200/index/address3/_mapping -d '{
"address3" : {
"properties" : {
"address" : {
"properties" : {
"city" : {
"type" : "string"
},
"line1" : {
"type" : "string"
},
"line2" : {
"type" : "string"
},
"location" : {
"type" : "string", "index": "not_analyzed"
}
}
},
"title" : {
"type" : "string"
}
}
}
}'
Again the query works well:
curl -XGET http://localhost:9200/index/address3/_search -d '
{
"query" :{"match" : {"address.location": "k r puram"}}
}'
Have you tried this? (use .raw sub field to match value on "not tokenized" value)
{"query":{
"bool":{
"must":[
{"match":{"published":true}},
{"match":{"inActive":false}},
{"range":{"propertyDetailsCategory.build_up_area":{"lte":200}}},
{"match":{"type":"commercial"}},
{"match":{"purpose":"rent"}},
{"range":{"commercialsCategory.exp_rent":{"lte":50000}}},
{"match":{"address.location.raw": "k r puram"}}
]
}
}
}
Try to use this query on your old mapping, it should work :)

How to use facet filtering with nested documents on ElasticSearch

I have the following mapping:
curl -XPUT 'http://localhost:9200/bookstore/user/_mapping' -d '
{
"user": {
"properties": {
"user_id": { "type": "integer" },
"gender": { "type": "string", "index" : "not_analyzed" },
"age": { "type": "integer" },
"age_bracket": { "type": "string", "index" : "not_analyzed" },
"current_city": { "type": "string", "index" : "not_analyzed" },
"relationship_status": { "type": "string", "index" : "not_analyzed" },
"books" : {
"type": "nested",
"properties" : {
"b_oid": { "type": "string", "index" : "not_analyzed" },
"b_name": { "type": "string", "index" : "not_analyzed" },
"bc_id": { "type": "integer" },
"bc_name": { "type": "string", "index" : "not_analyzed" },
"bcl_name": { "type": "string", "index" : "not_analyzed" },
"b_id": { "type": "integer" }
}
}
}
}
}'
Now, I try to query for example for Users which have "gender": "Male", have bought book in a certain category "bcl_name": "Trivia" and show the "b_name" book titles. I somehow cannot get it to run.
I have the query
curl -XGET 'http://localhost:9200/bookstore/user/_search?pretty=1' -d '{
"size": 0,
"from": 0,
"query": {
"filtered": {
"query": {
"terms": {
"gender": [
"Male"
]
}
}
}
},
"facets": {
"CategoryFacet": {
"terms": {
"field": "books.b_name",
"size": 5,
"shard_size": 1000,
"order": "count"
},
"nested": "books",
"facet_filter": {
"terms": {
"books.bcl_name": [
"Trivia"
]
}
}
}
}
}'
which returns a result, but I'm not sure whether this is correct. I looked for some examples, and found this (http://www.spacevatican.org/2012/6/3/fun-with-elasticsearch-s-children-and-nested-documents/) for example. I'm able to rewrite my query like this:
curl -XGET 'http://localhost:9200/bookstore/user/_search?pretty=1' -d '{
"size": 0,
"from": 0,
"query": {
"filtered": {
"query": {
"terms": {
"gender": [
"Male"
]
}
},
"filter": {
"nested": {
"path": "books",
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"and": [
{
"term": {
"books.bcl_name": "Trivia"
}
}
]
}
}
}
}
}
}
},
"facets": {
"CategoryFacet": {
"terms": {
"field": "books.b_name",
"size": 5,
"shard_size": 1000,
"order": "count"
},
"nested": "books"
}
}
}'
which shows different results.
I, as a beginner, am a litte lost right now. Can someone please give me hint on how to solve this`? Thanks a lot in advance!
First query means:
Search for users whose gender : "Male"
But "CategoryFacet" includes the count of gender : "Male" AND
books.bcl_name : "Trivia"
So in result set you get all "Male" users, but your CategoryFacet gives you the count of "Male users AND whose books.bcl_name is Trivia".
In second query your "CategoryFacet" does not include extra filtering. It just returns the facets from the exact result set.

Resources