Just to let things clear, first day working with Elastic... Moving to the problem.
I started to create my index with
curl -XPUT "http://localhost:9200/users" -d'
{
"mappings": {
"user": {
"properties": {
"education": {
"type": "nested"
},
"job": {
"type": "nested"
}
}
}
}
}'
and then
curl -XPOST "http://localhost:9200/users/user/" -d'
{
"name": "User A",
"education": [
{
"school": "School A1",
"course": "Course A1"
},
{
"school": "School A2",
"course": "Course A2"
}
]
}'
The problem that I'm facing now is the query part. I'm trying to get results with:
curl -XPOST "http://localhost:9200/users/user/_search?pretty" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "education",
"filter": {
"bool": {
"must": [
{
"term": {
"education.school": "School A1"
}
}
]
}
}
}
}
}
}
}'
But nothing is getting returned.
As per the mappings provided by you, school field is analyzed.
Analyzed means the text School A will split over space and will be tokenized as School and A.
you are searching using term query which looks for exact term. Study here about term query.
You can use Query_string with default_operator as AND
curl -XPOST "http://localhost:9200/users/user/_search?pretty" -d'
{
"query": {
"filtered": {
"query": {
"match_all": {}
},
"filter": {
"nested": {
"path": "education",
"filter": {
"bool": {
"must": [
{
"query": {
"query_string": {
"default_field": "education.school",
"query": "School A1",
"default_operator": "AND"
}
}
}
]
}
}
}
}
}
}
}'
Just leaving my 2 cents here. I would avoid using filtered query as it is being deprecated Check this in latest release of ES.
I'll just rewrite the above query without using filtered query
curl -XPOST "http://localhost:9200/users/user/_search?pretty" -d'
{
"query": {
"nested": {
"path": "education",
"query": {
"bool": {
"must": [
{
"query_string": {
"default_field": "education.school",
"query": "School A1",
"default_operator": "AND"
}
}
]
}
}
}
}
}'
I followed this doc to write above query.
Related
{
"query": {
"bool": {
"must": {
"bool": {
"must": [
{
"multi_match": {
"query": [
"869336","45345345"
],
"type": "phrase_prefix",
"fields": [
"id",
"accountNo",
"moblileNo"
]
}
}
],
"should": [],
"must_not": []
}
},
"filter": {
"bool": {
"must": {
"bool": {
"must": [],
"should": [],
"must_not": []
}
}
}
}
}
}
}
Need to use multi match query only.I have mentioned some sample fields in the query.I get below error when I run it on postman :
[multi_match] unknown token [START_ARRAY] after [query]
For the error that is because query only takes a single input string.
It can have multiple values like "query" : "869336 45345345" however with the values separated by spaces. How this works, you can probably go through this link.
Now looking into your scenario, assuming you'd want to apply phrase match queries for both the values i.e. 869336 and 45345345, you just need to separate out the values in its individual multi-match queries.
POST <your_index_name>/_search
{
"query": {
"bool": {
"must": {
"bool": {
"must": [
{
"multi_match": {
"query": "869336",
"type": "phrase_prefix",
"fields": [
"accountNo",
"moblileNo"
]
}
},
{
"multi_match": {
"query": "45345345",
"type": "phrase_prefix",
"fields": [
"accountNo",
"moblileNo"
]
}
}
],
"should": [],
"must_not": []
}
},
"filter": {
"bool": {
"must": {
"bool": {
"must": [],
"should": [],
"must_not": []
}
}
}
}
}
}
}
Now if you do not want to apply phrase_prefix and instead want to return all the documents having both values in any of the fields, you can simply write the query as below:
POST my-multimatch-index/_search
{
"query": {
"bool": {
"must": {
"bool": {
"must": [
{
"multi_match": {
"query": "869336 45345345",
"fields": [
"accountNo",
"moblileNo"
],
"type": "cross_fields", <--- Note this
"operator": "and" <--- This would mean only return those documents having both these value.
}
}
],
"should": [],
"must_not": []
}
},
"filter": {
"bool": {
"must": {
"bool": {
"must": [],
"should": [],
"must_not": []
}
}
}
}
}
}
}
Note the above comments and how I made use of cross-fields. Best to go through the links to get better understanding.
Anyways feel free to ask for questions and I hope this helps!
I have documents like this.
{
"a":"test",
"b":"harry"
},
{
"a":""
"b":"jack"
}
I need to update docs with field a==""(empty string) to default value say null in all documents for a given index.
Any help is appreciated. Thanks
Use Update by query with ingest
_update_by_query can also use the Ingest Node feature by specifying a pipeline like this:
define the pipeline
PUT _ingest/pipeline/set-foo
{
"description" : "sets foo",
"processors" : [ {
"set" : {
"field": "a",
"value": null
}
} ]
}
then you can use it like:
POST myindex/_update_by_query?pipeline=set-foo
{
"query": {
"filtered": {
"filter": {
"script": {
"script": "_source._content.length() == 0"
}
}
}
}
}'
OR
POST myindex/_update_by_query?pipeline=set-foo
{
"query": {
"bool" : {
"must" : {
"script" : {
"script" : {
"inline": "doc['a'].empty",
"lang": "painless"
}
}
}
}
}
}
To query a documents with empty string field value, i.e = ''
I did,
"query": {
"bool": {
"must": [
{
"exists": {
"field": "a"
}
}
],
"must_not": [
{
"wildcard": {
"a": "*"
}
}
]
}
}
So overall query to update all docs with field a=="" is,
POST test11/_update_by_query
{
"script": {
"inline": "ctx._source.a=null",
"lang": "painless"
},
"query": {
"bool": {
"must": [
{
"exists": {
"field": "a"
}
}
],
"must_not": [
{
"wildcard": {
"a": "*"
}
}
]
}
}
}
I have 2 documents in elasticsearch in the below structure:
Document 1:
{
"specification": [
{
"name": "Processor",
"value": "Intel"
},
{
"name": "RAM",
"value": "2GB"
}
]
}
Document 2:
{
"specification": [
{
"name": "Processor",
"value": "Intel"
},
{
"name": "RAM",
"value": "3GB"
}
]
}
I want to get the document that have a specification with values intel and 2GB (i.e) 1st document. But when i try to use must (AND operator) i am getting nothing. If i use should (OR operator) i am getting both the documents. Can anyone help me on this? Below is my query..
{
"query": {
"nested": {
"path": "specification",
"query": {
"bool": {
"must": [
{
"bool": {
"must": [
{ "match": { "specification.name": "Processor" }},
{ "match": { "specifications.value": "Intel" }}
]
}
},
{
"bool": {
"must": [
{ "match": { "specification.name": "RAM" }},
{ "match": { "specifications.value": "2GB" }}
]
}
}
]
}
}
}
}
}
Try this one:
{
"query": {
"bool": {
"must": [
{
"nested": {
"path": "specification",
"query": {
"bool": {
"must": [
{
"match": {
"specification.name": "Processor"
}
},
{
"match": {
"specification.value": "Intel"
}
}
]
}
}
}
},
{
"nested": {
"path": "specification",
"query": {
"bool": {
"must": [
{
"match": {
"specification.name": "RAM"
}
},
{
"match": {
"specification.value": "2GB"
}
}
]
}
}
}
}
]
}
}
}
I am new to Elasticsearch, and right now I am trying to figure out why my synonyms are not returning any results like I expect them to.
I created a custom filter and analyzer for my synonyms file and applied the analyzer to both the _all field and explicitly defined the specialty field to use it as well.
When I search for "specialty": "aids" without the analyzer/tokenizer, it gives me zero results as expected.
However, when I search for "specialty": "aids" with the analyzer/tokenizer, I expect it to give me the same results as searching for "speciality": "retrovirology", which should yields 3 results, but it comes back with nothing.
Is there something wrong with how I am approaching this?
Here are my settings and some sample data:
curl -XDELETE "http://localhost:9200/personsearch"
curl -XPUT "http://localhost:9200/personsearch" -d'
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"XYZSynAnalyzer": {
"tokenizer": "standard",
"filter": [
"XYZSynFilter"
]
}
},
"filter": {
"XYZSynFilter": {
"type": "synonym",
"synonyms": [
"aids, retrovirology"
]
}
}
}
}
},
"mappings": {
"xyzemployee": {
"_all": {
"analyzer": "XYZSynAnalyzer"
},
"properties": {
"firstName": {
"type": "string"
},
"lastName": {
"type": "string"
},
"middleName": {
"type": "string",
"include_in_all": false,
"index": "not_analyzed"
},
"specialty": {
"type": "string",
"analyzer": "XYZSynAnalyzer"
}
}
}
}
}'
curl -XPUT "http://localhost:9200/personsearch/xyzemployee/1" -d'
{
"firstName": "Don",
"middleName": "W.",
"lastName": "White",
"specialty": "Adult Retrovirology"
}'
curl -XPUT "http://localhost:9200/personsearch/xyzemployee/2" -d'
{
"firstName": "Terrance",
"middleName": "G.",
"lastName": "Gartner",
"specialty": "Retrovirology"
}'
curl -XPUT "http://localhost:9200/personsearch/xyzemployee/3" -d'
{
"firstName": "Carter",
"middleName": "L.",
"lastName": "Taylor",
"specialty": "Pediatric Retrovirology"
}'
# Why is this returning nothing?
curl -XGET "http://localhost:9200/personsearch/xyzemployee/_search?pretty=true" -d'
{
"query": {
"match": {
"specialty": "retrovirology"
}
}
}'
You aren't lowercasing anywhere.
Try this:
{
"settings": {
"index": {
"analysis": {
"analyzer": {
"XYZSynAnalyzer": {
"tokenizer": "standard",
"filter": [
"lowercase", "XYZSynFilter"
]
}
},
"filter": {
"XYZSynFilter": {
"type": "synonym",
"synonyms": [
"aids, retrovirology"
]
}
}
}
}
}
Note: you may want to split your index analyzer and search analyzer, and choose only one of them to do the synonyms. Expanding them only during indexing will speed search results.
I have an index like following settings and mapping;
{
"settings":{
"index":{
"analysis":{
"analyzer":{
"analyzer_keyword":{
"tokenizer":"keyword",
"filter":"lowercase"
}
}
}
}
},
"mappings":{
"product":{
"properties":{
"name":{
"analyzer":"analyzer_keyword",
"type":"string",
"index": "not_analyzed"
}
}
}
}
}
I am struggling with making an implementation for wildcard search on name field. My example data like this;
[
{"name": "SVF-123"},
{"name": "SVF-234"}
]
When I perform following query;
http://localhost:9200/my_index/product/_search -d '
{
"query": {
"filtered" : {
"query" : {
"query_string" : {
"query": "*SVF-1*"
}
}
}
}
}'
It returns SVF-123,SVF-234. I think, it still tokenizes data. It must return only SVF-123.
Could you please help on this?
Thanks in advance
There's a couple of things going wrong here.
First, you are saying that you don't want terms analyzed index time. Then, there's an analyzer configured (that's used search time) that generates incompatible terms. (They are lowercased)
By default, all terms end up in the _all-field with the standard analyzer. That is where you end up searching. Since it tokenizes on "-", you end up with an OR of "*SVF" and "1*".
Try to do a terms facet on _all and on name to see what's going on.
Here's a runnable Play and gist: https://www.found.no/play/gist/3e5fcb1b4c41cfc20226 (https://gist.github.com/alexbrasetvik/3e5fcb1b4c41cfc20226)
You need to make sure the terms you index is compatible with what you search for. You probably want to disable _all, since it can muddy what's going on.
#!/bin/bash
export ELASTICSEARCH_ENDPOINT="http://localhost:9200"
# Create indexes
curl -XPUT "$ELASTICSEARCH_ENDPOINT/play" -d '{
"settings": {
"analysis": {
"text": [
"SVF-123",
"SVF-234"
],
"analyzer": {
"analyzer_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"type": {
"properties": {
"name": {
"type": "string",
"index": "not_analyzed",
"analyzer": "analyzer_keyword"
}
}
}
}
}'
# Index documents
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_bulk?refresh=true" -d '
{"index":{"_index":"play","_type":"type"}}
{"name":"SVF-123"}
{"index":{"_index":"play","_type":"type"}}
{"name":"SVF-234"}
'
# Do searches
# See all the generated terms.
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"facets": {
"name": {
"terms": {
"field": "name"
}
},
"_all": {
"terms": {
"field": "_all"
}
}
}
}
'
# Analyzed, so no match
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"match": {
"name": {
"query": "SVF-123"
}
}
}
}
'
# Not analyzed according to `analyzer_keyword`, so matches. (Note: term, not match)
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"term": {
"name": {
"value": "SVF-123"
}
}
}
}
'
curl -XPOST "$ELASTICSEARCH_ENDPOINT/_search?pretty" -d '
{
"query": {
"term": {
"_all": {
"value": "svf"
}
}
}
}
'
My solution adventure
I have started my case as you can see in my question. Whenever, I have changed a part of my settings, one part started to work, but another part stop working. Let me give my solution history:
1.) I have indexed my data as default. This means, my data is analyzed as default. This will cause problem on my side. For example;
When user started to search a keyword like SVF-1, system run this query:
{
"query": {
"filtered" : {
"query" : {
"query_string" : {
"analyze_wildcard": true,
"query": "*SVF-1*"
}
}
}
}
}
and results;
SVF-123
SVF-234
This is normal, because name field of my documents are analyzed. This splits query into tokens SVF and 1, and SVF matches my documents, although 1 does not match. I have skipped this way. I have create a mapping for my fields make them not_analyzed
{
"mappings":{
"product":{
"properties":{
"name":{
"type":"string",
"index": "not_analyzed"
},
"site":{
"type":"string",
"index": "not_analyzed"
}
}
}
}
}
but my problem continued.
2.) I wanted to try another way after lots of research. Decided to use wildcard query.
My query is;
{
"query": {
"wildcard" : {
"name" : {
"value" : *SVF-1*"
}
}
},
"filter":{
"term": {"site":"pro_en_GB"}
}
}
}
This query worked, but one problem here. My fields are not_analyzed anymore, and I am making wildcard query. Case sensitivity is problem here. If I search like svf-1, it returns nothing. Since, user can input lowercase version of query.
3.) I have changed my document structure to;
{
"mappings":{
"product":{
"properties":{
"name":{
"type":"string",
"index": "not_analyzed"
},
"nameLowerCase":{
"type":"string",
"index": "not_analyzed"
}
"site":{
"type":"string",
"index": "not_analyzed"
}
}
}
}
}
I have adde one more field for name called nameLowerCase. When I am indexing my document, I am setting my document like;
{
name: "SVF-123",
nameLowerCase: "svf-123",
site: "pro_en_GB"
}
Here, I am converting query keyword to lowercase and make search operation on new nameLowerCase index. And displaying name field.
Final version of my query is;
{
"query": {
"wildcard" : {
"nameLowerCase" : {
"value" : "*svf-1*"
}
}
},
"filter":{
"term": {"site":"pro_en_GB"}
}
}
}
Now it works. There is also one way to solve this problem by using multi_field. My query contains dash(-), and faced some problems.
Lots of thanks to #Alex Brasetvik for his detailed explanation and effort
Adding to Hüseyin answer, we can use AND as the default operator. So SVF and 1* will be joined using AND operator, therefore giving us the correct results.
"query": {
"filtered" : {
"query" : {
"query_string" : {
"default_operator": "AND",
"analyze_wildcard": true,
"query": "*SVF-1*"
}
}
}
}
#Viduranga Wijesooriya as you stated "default_operator" : "AND" will check for presence of both SVF and 1 but exact match alone is still not possible,
but ya this will filter the results in more appropriate way leaving with all combination of SVF and 1 and sorting the results by relevance which will promote SVF-1 up the order
For pulling out the exact result
"settings": {
"analysis": {
"analyzer": {
"analyzer_keyword": {
"type": "custom",
"tokenizer": "keyword",
"filter": [
"lowercase"
]
}
}
}
},
"mappings": {
"type": {
"properties": {
"name": {
"type": "string",
"analyzer": "analyzer_keyword"
}
}
}
}
and the query is
{
"query": {
"bool": {
"must": [
{
"query_string" : {
"fields": ["name"],
"query" : "*svf-1*",
"analyze_wildcard": true
}
}
]
}
}
}
result
{
"took": 4,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "play",
"_type": "type",
"_id": "AVfXzn3oIKphDu1OoMtF",
"_score": 1,
"_source": {
"name": "SVF-123"
}
}
]
}
}