Elasticsearch: Searching for fields with mapping not_analyzed get no hits - node.js

I have elasticsearch running and do all my requests with nodejs.
I have the following mapping applied for my index "mastert4":
{
"mappings": {
"mastert4": {
"properties": {
"s": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
I added exactly one document to the index which looks pretty much like this:
{
"master": {
"vi": "ff155d9696818dde0627e14c79ba5d344c3ef01d",
"s": "Anne Will"
}
}
Now doing any of the following search queries will not return any hits:
{
"index": "mastert4",
"body": {
"query": {
"filtered": {
"query": {
"match"/"term": {
"s": "anne will"/"Anne Will"
}
}
}
}
}
}
But the following query will return the exact document:
{
"index": "mastert4",
"body": {
"query": {
"filtered": {
"query": {
"constant_score": {
"filter": [
{
"missing": {
"field": "s"
}
}
]
}
}
}
}
}
}
And if I search for
{
"exists": {
"field": "s"
}
}
I will get no hits again.
When analyzing the field itsself I get:
{
"tokens": [
{
"token": "Anne Will",
"start_offset": 0,
"end_offset": 9,
"type": "word",
"position": 1
}
]
}
I'm really in a dead end here. Can someone tell me where I did wrong? Thx!!!!

You've enclosed the fields s and vi inside an outer field called master which is not declared in your mapping. That's the reason. If you query for master.s, you'll get results.
The second solution is to remove the enclosing master object in your document and that will work also:
{
"vi": "ff155d9696818dde0627e14c79ba5d344c3ef01d",
"s": "Anne Will"
}

Related

Is there a Group BY function for finding result with elastic search query?

I have tried to integrate group by with elastic search. But I didn't get the answer properly. Please support me to fix this issue. Indexed data is,
data = [
{ "fruit":"apple", "taste":5, "timestamp":100},
{ "fruit":"pear", "taste":5, "timestamp":110},
{ "fruit":"apple", "taste":4, "timestamp":200},
{ "fruit":"pear", "taste":8, "timestamp":90},
{ "fruit":"banana", "taste":5, "timestamp":100}]`
My query is,
`myQuery = {"query": {
"match_all": {}
},
"aggs": {
"group_by_fruit": {
"terms": {
"field": "fruit.keyword"
},
}
}
}
It showing all 5 data in the output. Actually I nee d to get only 3 records. The expected result is,
[
{ "fruit":"apple", "taste":4, "timestamp":200},
{ "fruit":"pear", "taste":8, "timestamp":90},
{ "fruit":"banana", "taste":5, "timestamp":100}]
If you want to get the documents with distinct fruit fields having the largest timestamp value you should use a top_hits aggregation.
{
"query": {
"match_all": {}
},
"size": 0,
"aggs": {
"top_tags": {
"terms": {
"field": "fruit.keyword",
"size": <MAX_NUMBER_OF_DISTINCT_FRUITS>
},
"aggs": {
"group_by_fruit": {
"top_hits": {
"sort": [
{
"timestamp": {
"order": "desc"
}
}
],
"size" : 1
}
}
}
}
}
}

How to calculate total for each token in Elasticsearch

I have a request into Elastic
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"something1 OR something2 OR something3",
"default_operator":"OR"
}
}
],
"filter":{
"range":{
"time":{
"gte":date
}
}
}
}
}
}
I wanna calculate count for each token in all documents using elastic search in one request, for example:
something1: 26 documents
something2: 12 documents
something3: 1 documents
Assuming that the tokens are not akin to enumerations (i.e. constrained set of specific values, like state names, which would make a terms aggregation your best bet with the right mapping), I think the closest thing to what you want would be to use filters aggregation:
POST your-index/_search
{
"query":{
"bool":{
"must":[
{
"query_string":{
"query":"something1 OR something2 OR something3",
"default_operator":"OR"
}
}
],
"filter":{
"range":{
"time":{
"gte":date
}
}
}
}
},
"aggs": {
"token_doc_counts": {
"filters" : {
"filters" : {
"something1" : {
"bool": {
"must": { "query_string" : { "query" : "something1" } },
"filter": { "range": { "time": { "gte": date } } }
}
},
"something2" : {
"bool": {
"must": { "query_string" : { "query" : "something2" } },
"filter": { "range": { "time": { "gte": date } } }
}
},
"something3" : {
"bool": {
"must": { "query_string" : { "query" : "something3" } },
"filter": { "range": { "time": { "gte": date } } }
}
}
}
}
}
}
}
The response would look something like:
{
"took": 9,
"timed_out": false,
"_shards": ...,
"hits": ...,
"aggregations": {
"token_doc_counts": {
"buckets": {
"something1": {
"doc_count": 1
},
"something2": {
"doc_count": 2
},
"something3": {
"doc_count": 3
}
}
}
}
}
You can split your query into filters aggregation of three filters. For reference look here: https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-filters-aggregation.html
What you would need to do, is to create a Copy_To field and have the mapping as shown below.
Depending on the fields that your query_string queries, you need to include some or all of the fields with copy_to field.
By default query_string searches all the fields, so you may need to specify copy_to for all the fields as shown in below mapping, where for sake of simplicity, I've created only three fields, title, field_2 and a third field content which would act as copied to field.
Mapping
PUT <your_index_name>
{
"mappings": {
"mydocs": {
"properties": {
"title": {
"type": "text",
"copy_to": "content"
},
"field_2": {
"type": "text",
"copy_to": "content"
},
"content": {
"type": "text",
"fielddata": true
}
}
}
}
}
Sample Documents
POST <your_index_name>/mydocs/1
{
"title": "something1",
"field_2": "something2"
}
POST <your_index_name>/mydocs/2
{
"title": "something2",
"field_2": "something3"
}
Query:
You'd get the required document counts for the each and every token using the below aggregation query and I've made use of Terms Aggregation:
POST <your_index_name>/_search
{
"size": 0,
"query": {
"query_string": {
"query": "something1 OR something2 OR something3"
}
},
"aggs": {
"myaggs": {
"terms": {
"field": "content",
"include" : ["something1","something2","something3"]
}
}
}
}
Query Response:
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"skipped": 0,
"failed": 0
},
"hits": {
"total": 2,
"max_score": 0,
"hits": []
},
"aggregations": {
"myaggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "something2",
"doc_count": 2
},
{
"key": "something1",
"doc_count": 1
},
{
"key": "something3",
"doc_count": 1
}
]
}
}
}
Let me know if it helps!

Elasticsearch sort on multiple queries

I have a query like so:
{
"sort": [
{
"_geo_distance": {
"geo": {
"lat": 39.802763999999996,
"lon": -105.08748399999999
},
"order": "asc",
"unit": "mi",
"mode": "min",
"distance_type": "sloppy_arc"
}
}
],
"query": {
"bool": {
"minimum_number_should_match": 0,
"should": [
{
"match": {
"name": ""
}
},
{
"match": {
"credit": true
}
}
]
}
}
}
I want my search to always return ALL results, just sorted with those which have matching flags closer to the top.
I would like the sorting priority to go something like:
searchTerm (name, a string)
flags (credit/atm/ada/etc, boolean values)
distance
How can this be achieved?
So far, the query you see above is all I've gotten. I haven't been able to figure out how to always return all results, nor how to incorporate the additional queries into the sort.
I don't believe "sort" is the answer you are looking for, actually. I believe you need a trial-and-error approach starting with a simple "bool" query where you put all your criterias (name, flags, distance). Then you give your name criteria more weight (boost) then a little bit less to your flags and even less to the distance calculation.
A "bool" "should" would be able to give you a sorted list of documents based on the _score of each and, depending on how you score each criteria, the _score is being influenced more or less.
Also, returning ALL the elements is not difficult: just add a "match_all": {} to your "bool" "should" query.
This would be a starting point, from my point of view, and, depending on your documents and your requirements (see my comment to your post about the confusion) you would need to adjust the "boost" values and test, adjust again and test again etc:
{
"query": {
"bool": {
"should": [
{ "constant_score": {
"boost": 6,
"query": {
"match": { "name": { "query": "something" } }
}
}},
{ "constant_score": {
"boost": 3,
"query": {
"match": { "credit": { "query": true } }
}
}},
{ "constant_score": {
"boost": 3,
"query": {
"match": { "atm": { "query": false } }
}
}},
{ "constant_score": {
"boost": 3,
"query": {
"match": { "ada": { "query": true } }
}
}},
{ "constant_score": {
"query": {
"function_score": {
"functions": [
{
"gauss": {
"geo": {
"origin": {
"lat": 39.802763999999996,
"lon": -105.08748399999999
},
"offset": "2km",
"scale": "3km"
}
}
}
]
}
}
}
},
{
"match_all": {}
}
]
}
}
}

Elasticsearch lowercase filter search

I'm trying to search my database and be able to use upper/lower case filter terms but I've noticed while query's apply analyzers, I can't figure out how to apply a lowercase analyzer on a filtered search. Here's the query:
{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"term": {
"language": "mandarin" // Returns a doc
}
},
{
"term": {
"language": "Italian" // Does NOT return a doc, but will if lowercased
}
}
]
}
}
}
}
}
I have a type languages that I have lowercased using:
"analyzer": {
"lower_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": "lowercase"
}
}
and a corresponding mapping:
"mappings": {
"languages": {
"_id": {
"path": "languageID"
},
"properties": {
"languageID": {
"type": "integer"
},
"language": {
"type": "string",
"analyzer": "lower_keyword"
},
"native": {
"type": "string",
"analyzer": "keyword"
},
"meta": {
"type": "nested"
},
"language_suggest": {
"type": "completion"
}
}
}
}
The problem is that you have a field that you have analyzed during index to lowercase it, but you are using a term filter for the query which is not analyzed:
Term Filter
Filters documents that have fields that contain a term (not analyzed).
Similar to term query, except that it acts as a filter.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-term-filter.html
I'd try using a query filter instead:
Query Filter
Wraps any query to be used as a filter. Can be placed within queries
that accept a filter.
Example:
{
"constantScore" : {
"filter" : {
"query" : {
"query_string" : {
"query" : "this AND that OR thus"
}
}
}
} }
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-filter.html#query-dsl-query-filter
This may be achieved by appending .keyword to your field to query against the keyword version of the field. Assuming language was defined in the mapping with type keyword.
Note that now only the exact text would match: mandarin won't match and Italian would.
Your query would end up like this:
{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"term": {
"language.keyword": "mandarin" // Returns Empty
}
},
{
"term": {
"language.keyword": "Italian" // Returns Italian.
}
}
]
}
}
}
}
}
Combining the term values is also allowed:
{
"query": {
"filtered": {
"filter": {
"bool": {
"should": [
{
"term": {
"language.keyword":
["mandarin", "Italian"]
}
}
]
}
}
}
}
}

Elasticsearch wildcard search on not_analyzed field

I have an index like following settings and mapping;
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"analyzer_keyword":{
"tokenizer":"keyword",
"filter":"lowercase"
}
}
}
}
},
"mappings":{
"product":{
"properties":{
"name":{
"analyzer":"analyzer_keyword",
"type":"string",
"index": "not_analyzed"
}
}
}
}
}
I am struggling with making an implementation for wildcard search on name field. My example data like this;
[
{"name": "SVF-123"},
{"name": "SVF-234"}
]
When I perform following query;
http://localhost:9200/my_index/product/_search -d '
{
"query": {
"filtered" : {
"query" : {
"query_string" : {
"query": "*SVF-1*"
}
}
}
}
}'
It returns SVF-123,SVF-234. I think, it still tokenizes data. It must return only SVF-123.
Could you please help on this?
Thanks in advance
There's a couple of things going wrong here.
First, you are saying that you don't want terms analyzed index time. Then, there's an analyzer configured (that's used search time) that generates incompatible terms. (They are lowercased)
By default, all terms end up in the _all-field with the standard analyzer. That is where you end up searching. Since it tokenizes on "-", you end up with an OR of "*SVF" and "1*".
Try to do a terms facet on _all and on name to see what's going on.
Here's a runnable Play and gist: https://www.found.no/play/gist/3e5fcb1b4c41cfc20226 (https://gist.github.com/alexbrasetvik/3e5fcb1b4c41cfc20226)
You need to make sure the terms you index is compatible with what you search for. You probably want to disable _all, since it can muddy what's going on.
#!/bin/bash
export ELASTICSEARCH_ENDPOINT="http://localhost:9200"
# Create indexes
curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
"settings": {
"analysis": {
"text": [
"SVF-123",
"SVF-234"
],
"analyzer": {
"analyzer_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"type": {
"properties": {
"name": {
"type": "string",
"index": "not_analyzed",
"analyzer": "analyzer_keyword"
}
}
}
}
}'
# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"name":"SVF-123"}
{"index":{"_index":"play","_type":"type"}}
{"name":"SVF-234"}
'
# Do searches
# See all the generated terms.
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"facets": {
"name": {
"terms": {
"field": "name"
}
},
"_all": {
"terms": {
"field": "_all"
}
}
}
}
'
# Analyzed, so no match
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"match": {
"name": {
"query": "SVF-123"
}
}
}
}
'
# Not analyzed according to `analyzer_keyword`, so matches. (Note: term, not match)
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"term": {
"name": {
"value": "SVF-123"
}
}
}
}
'
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"term": {
"_all": {
"value": "svf"
}
}
}
}
'
My solution adventure
I have started my case as you can see in my question. Whenever, I have changed a part of my settings, one part started to work, but another part stop working. Let me give my solution history:
1.) I have indexed my data as default. This means, my data is analyzed as default. This will cause problem on my side. For example;
When user started to search a keyword like SVF-1, system run this query:
{
"query": {
"filtered" : {
"query" : {
"query_string" : {
"analyze_wildcard": true,
"query": "*SVF-1*"
}
}
}
}
}
and results;
SVF-123
SVF-234
This is normal, because name field of my documents are analyzed. This splits query into tokens SVF and 1, and SVF matches my documents, although 1 does not match. I have skipped this way. I have create a mapping for my fields make them not_analyzed
{
"mappings":{
"product":{
"properties":{
"name":{
"type":"string",
"index": "not_analyzed"
},
"site":{
"type":"string",
"index": "not_analyzed"
}
}
}
}
}
but my problem continued.
2.) I wanted to try another way after lots of research. Decided to use wildcard query.
My query is;
{
"query": {
"wildcard" : {
"name" : {
"value" : *SVF-1*"
}
}
},
"filter":{
"term": {"site":"pro_en_GB"}
}
}
}
This query worked, but one problem here. My fields are not_analyzed anymore, and I am making wildcard query. Case sensitivity is problem here. If I search like svf-1, it returns nothing. Since, user can input lowercase version of query.
3.) I have changed my document structure to;
{
"mappings":{
"product":{
"properties":{
"name":{
"type":"string",
"index": "not_analyzed"
},
"nameLowerCase":{
"type":"string",
"index": "not_analyzed"
}
"site":{
"type":"string",
"index": "not_analyzed"
}
}
}
}
}
I have adde one more field for name called nameLowerCase. When I am indexing my document, I am setting my document like;
{
name: "SVF-123",
nameLowerCase: "svf-123",
site: "pro_en_GB"
}
Here, I am converting query keyword to lowercase and make search operation on new nameLowerCase index. And displaying name field.
Final version of my query is;
{
"query": {
"wildcard" : {
"nameLowerCase" : {
"value" : "*svf-1*"
}
}
},
"filter":{
"term": {"site":"pro_en_GB"}
}
}
}
Now it works. There is also one way to solve this problem by using multi_field. My query contains dash(-), and faced some problems.
Lots of thanks to #Alex Brasetvik for his detailed explanation and effort
Adding to Hüseyin answer, we can use AND as the default operator. So SVF and 1* will be joined using AND operator, therefore giving us the correct results.
"query": {
"filtered" : {
"query" : {
"query_string" : {
"default_operator": "AND",
"analyze_wildcard": true,
"query": "*SVF-1*"
}
}
}
}
#Viduranga Wijesooriya as you stated "default_operator" : "AND" will check for presence of both SVF and 1 but exact match alone is still not possible,
but ya this will filter the results in more appropriate way leaving with all combination of SVF and 1 and sorting the results by relevance which will promote SVF-1 up the order
For pulling out the exact result
"settings": {
"analysis": {
"analyzer": {
"analyzer_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"type": {
"properties": {
"name": {
"type": "string",
"analyzer": "analyzer_keyword"
}
}
}
}
and the query is
{
"query": {
"bool": {
"must": [
{
"query_string" : {
"fields": ["name"],
"query" : "*svf-1*",
"analyze_wildcard": true
}
}
]
}
}
}
result
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "play",
"_type": "type",
"_id": "AVfXzn3oIKphDu1OoMtF",
"_score": 1,
"_source": {
"name": "SVF-123"
}
}
]
}
}

Resources