{
"query": {
"bool": {
"must": {
"bool": {
"must": [
{
"multi_match": {
"query": "FGTHSF-2124-6",
"type": "phrase_prefix",
"fields": [
"contact.name"
]
}
},
{
"terms": {
"contact.id": [
"sdfwerwe",
"6789",
"4567",
"12345"
]
}
}
],
"should": [],
"must_not": []
}
},
"filter": {
"bool": {
"must": {
"bool": {
"must": [],
"should": [],
"must_not": []
}
}
}
}
}
}
}
I have this query if I search it I get no results because the lasst character is single digit but if I give 3 digit it searches proper record...Is there any by default size for phrase prefix query and if it has then how to change it.
I tried giving default operator, prefix_length, max expansions etc.
contact.name has search_analyzer as "standard" and index analyzer as "autocomplete"(settings not available in question). Your issue is because field is converted to different tokens while indexing and searching so they are not matching.
Usually, the same analyzer should be applied at index time and at
search time, to ensure that the terms in the query are in the same
format as the terms in the inverted index.
Sometimes, though, it can make sense to use a different analyzer at
search time, such as when using the edge_ngram tokenizer for
autocomplete.
ex. with below settings
"settings": {
"analysis": {
"filter": {
"autocomplete_filter": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 3
}
},
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"autocomplete_filter"
]
}
}
}
},
"mappings": {
.....
}
Following tokens will be stored in index
GET index79/_analyze
{
"text": "FGTHSF-2124-645",
"analyzer": "autocomplete"
}
{
"tokens" : [
{
"token" : "fgt",
"start_offset" : 0,
"end_offset" : 6,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "212",
"start_offset" : 7,
"end_offset" : 11,
"type" : "<NUM>",
"position" : 1
},
{
"token" : "645",
"start_offset" : 12,
"end_offset" : 15,
"type" : "<NUM>",
"position" : 2
}
]
}
Now when you are searching for "query": "HSF-2124-6", it will not return any document(["HSF","2124","6"]) as 6 is not present in any token. To return document you need to change "min_gram" to 1 to generate tokens of size 1 , 3 and 3(645=> 6 64 645).
If you are not using edge gram then please add autocomplete definition
Edit 1:
If you will see in your snapshot min gram size is 5 and max gram 20. Minimum token generated is of size 5 ex "Abhijeet" tokens generated for these will be "Abhij","Abhije","Abhijee","Abhjieet". So any text less that size 5 will not match ex "Abhi". In your case take text after splitting on hyphen("-") so 6 is not matching.
You need to update your index settings and make min-gram:1.
Steps.
1. POST /index80/_close
Below is just an example in your settings copy entire analysis part , update mingram and so a PUT request
PUT index80/_settings
{
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "edge_ngram",
"min_gram": 1,
"max_gram": 10,
"token_chars": [
"letter",
"digit"
]
}
}
}
}
3. POST /index80/_open
Check here on how to update settings for further info.
Note: Reducing min size will cause creation of additional tokens and will increase size of your index. Choice of min and max gram should be based on your query text size
Related
I have a name field value as "abc_name" so when I search "abc_" I am getting proper results but when I search "abc_##£&-#&" still I am getting same results. I want my query to ignore this special characters that doesn't matches with my query.
My query has:
Multi_match
type as cross_fields
operator AND
I am using search_analyzer standard for my Fields
And I want this structure as it is otherwise it will affect my other Search behaviour
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
Please see the below sample which would fit your use case where I've created a custom analyzer which would fit your use case:
Sample Mapping:
PUT some_test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "custom_tokenizer",
"filter": ["lowercase", "3_5_edge_ngram"]
}
},
"tokenizer": {
"custom_tokenizer": {
"type": "pattern",
"pattern": "\\w+_+[^a-zA-Z\\d\\s_]+|\\s+". <---- Note this pattern
}
},
"filter": {
"3_5_edge_ngram": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 5
}
}
}
},
"mappings": {
"properties": {
"my_field":{
"type": "text",
"analyzer": "my_custom_analyzer"
}
}
}
}
The above mentioned pattern would simply ignore the tokens with the format like abc_$%^^##. As a result this token would not be indexed.
Note that the way the analyzer works is:
First executes tokenizer
Then applies the edge_ngram filter on the tokens generated.
You can verify by simply removing the edge_ngram filter in the above mapping to first understand what tokens are getting generated via Analyze API which would be as below:
POST some_test_index/_analyze
{
"analyzer": "my_custom_analyzer",
"text": "abc_name asda efg_!##!## 1213_adav"
}
Tokens generated:
{
"tokens" : [
{
"token" : "abc_name",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 0
},
{
"token" : "asda",
"start_offset" : 9,
"end_offset" : 13,
"type" : "word",
"position" : 1
},
{
"token" : "1213_adav",
"start_offset" : 25,
"end_offset" : 34,
"type" : "word",
"position" : 2
}
]
}
Note that the token efg_!##!## has been removed.
I've added edge_ngram fitler as you would want the search to be successful if you search with abc_ if your tokens generated via tokenizer is abc_name.
Sample Document:
POST some_test_index/_doc/1
{
"my_field": "abc_name asda efg_!##!## 1213_adav"
}
Query Request:
Use-case 1:
POST some_test_index/_search
{
"query": {
"match": {
"my_field": "abc_"
}
}
}
Use-case-2:
POST some_test_index/_search
{
"query": {
"match": {
"my_field": "efg_!##!##"
}
}
}
Responses:
Response for use-case-1:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.47992462,
"hits" : [
{
"_index" : "some_test_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.47992462,
"_source" : {
"my_field" : "abc_name asda efg_!##!## 1213_adav"
}
}
]
}
}
Response for use-case-2:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
Updated Answer:
Create your mapping as follows based on the index I've created and let me know if that works:
PUT some_test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "punctuation",
"filter": ["lowercase"]
}
},
"tokenizer": {
"punctuation": {
"type": "pattern",
"pattern": "\\w+_+[^a-zA-Z\\d\\s_]+|\\s+"
}
}
}
},
"mappings": {
"properties": {
"my_field":{
"type": "text",
"analyzer": "autocompete", <----- Assuming you have already this in setting
"search_analyzer": "my_custom_analyzer". <----- Note this
}
}
}
}
Please try and let me know if this works for all your use-cases.
I want to aggregate on the brand field and is give me two results instead of one
The brands_aggs give me from this text
{name : "Brand 1"}
2 results
Brand and 1
But Why I need only Brand 1
is separate the word brand and 1 from (Brand 1)
and is give me 2 results in the aggrecation
my mappings where I want to aggregate
mapping = {
"mappings": {
"product": {
"properties": {
"categories": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"fielddata": True
}
"brand": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"fielddata": True
}
}
}
}
}
my post request
{
"query" : {
"bool": {
"must": [
{"match": { "categories": "AV8KW5Wi31qHZdVeXG4G" }}
]
}
},
"size" : 0,
"aggs" : {
"brand_aggs" : {
"terms" : { "field" : "brand" }
},
"categories_aggs" : {
"terms" : { "field" : "categories" }
}
}
}
response from the server
{
"took": 18,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0,
"hits": []
},
"aggregations": {
"categories_aggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "av8kw5wi31qhzdvexg4g",
"doc_count": 1
},
{
"key": "av8kw61c31qhzdvexg4h",
"doc_count": 1
},
{
"key": "av8kxtch31qhzdvexg4a",
"doc_count": 1
}
]
},
"brand_aggs": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "1", <==== I dont need this , why is give me that ??
"doc_count": 1
},
{
"key": "brand",
"doc_count": 1
}
]
},
}
}
Your mapping has property fields which is used when you want to have multiple analyzers for the same field. In your case valid name of your field is 'brand.keyword'. When you call your aggregate for just 'brand' it use default mapping defined for string.
So your query should be:
{
"query" : {
"bool": {
"must": [
{"match": { "categories": "AV8KW5Wi31qHZdVeXG4G" }}
]
}
},
"size" : 0,
"aggs" : {
"brand_aggs" : {
"terms" : { "field" : "brand.keyword" }
},
"categories_aggs" : {
"terms" : { "field" : "categories.keyword" }
}
}
}
Property field is useful when you want for example search the same property which multiple analyzers, for example:
"full_name": {
"type": "text",
"analyzer": "standard",
"boost": 1,
"fields": {
"autocomplete": {
"type": "text",
"analyzer": "ngram_analyzer"
},
"standard":{
"type": "text",
"analyzer": "standard"
}
}
},
You need to map your string as not_analyzed string, for that run the below query
PUT your_index/_mapping/your_type
{
"your_type": {
"properties": {
"brand": {
"type": "string",
"index": "analyzed",
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
Don't forget to replace the your_type and your_index with your type and index values.
Background: I've implemented a partial search on a name field by indexing the tokenized name (name field) as well as a trigram analyzed name (ngram field).
I've boosted the name field to have exact token matches bubble up to the top of the results.
Problem: I am trying to implement a query that limits the nGram matches to ones that only match some threshold (say 80%) of the query string. I understand that minimum_should_match seems to be what I am looking for, but my problem is forming the query to actually produce those results.
My exact token matches are boosted to the top but I still get every document that has a single matching trigram in the ngram field.
GIST: Index settings and mapping
Index Settings
{
"my_index": {
"settings": {
"index": {
"number_of_shards": "5",
"max_result_window": "30000",
"creation_date": "1475853851937",
"analysis": {
"filter": {
"ngram_filter": {
"type": "ngram",
"min_gram": "3",
"max_gram": "3"
}
},
"analyzer": {
"ngram_analyzer": {
"filter": [
"lowercase",
"ngram_filter"
],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "AuCjcP5sSb-m59bYrprFcw",
"version": {
"created": "2030599"
}
}
}
}
}
Index Mappings
{
"my_index": {
"mappings": {
"my_type": {
"properties": {
"acw": {
"type": "integer"
},
"pcg": {
"type": "integer"
},
"date": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"dob": {
"type": "date",
"format": "strict_date_optional_time||epoch_millis"
},
"id": {
"type": "string"
},
"name": {
"type": "string",
"boost": 10
},
"ngram": {
"type": "string",
"analyzer": "ngram_analyzer"
},
"bdk": {
"type": "integer"
},
"mmw": {
"type": "integer"
},
"mpi": {
"type": "integer"
},
"sex": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
}
Solution Attempts
[GIST: Query Attempts] unlinkifying due to 2 link limit :(
(https://gist.github.com/jordancardwell/2e690013666e7e1da6ef1acee314b4e6)
I tried a multi-match query, which gives me correct search results, but I haven't had luck omitting results for names that only match a single trigram (say "odo" trigram inside "theodophilus")
//this matches 'frodo' and sends results to the top, since `name` field is boosted
// but also matches 'theodore' and 'rodolpho'
{
"size":100,
"from":0,
"query":{
"multi_match":{
"query":"frodo",
"fields":[
"name",
"ngram"
],
"type":"best_fields"
}
}
}
.
//I then tried to throw in the `minimum_must_match` option
// hoping it would filter out large strings that only had one matching trigram for instance
{
"size":100,
"from":0,
"query":{
"multi_match":{
"query":"frodo",
"fields":[
"name",
"ngram"
],
"type":"best_fields",
"minimum_should_match": "90%",
}
}
}
I've tried playing around in sense, to manually produce the match queries that this produces to allow me to only apply minimum_must_match to the ngram field but can't seem to get the syntax right.
// I then tried to contruct a custom query to just return the `minimum_should_match`d results on the ngram field
// I started with a query produced by using bodybuilder to `and` and `or` my other search criteria together
{
"query": {
"bool": {
"filter": {
"bool": {
"must": [
//each separate field's criteria `must`/`and`ed together
{
"query": {
"bool": {
"filter": {
"bool": {
"should": [
//each critereon for a specific field `should`/`or`ed together
{
//my attempt at getting `ngram` field results..
// should theoretically only return when field
// contains nothing but matching ngrams
// (i.e. exact matches and other fluke matches)
"query": {
"match": {
"ngram": {
"query": "frodo",
"minimum_should_match": "100%"
}
}
}
}
//... other critereon to be `should`/`or`ed together
]
}
}
}
}
}
//... other criteria to be `must`/`and`ed together
]
}
}
}
}
}
Can anyone see what I'm doing wrong?
It seems like this should be fairly straightforward to accomplish, but I must be missing something obvious.
UPDATE
I ran a query with _explain=true (using sense UI) to try to understand my results.
I queried for a match on the ngram field for "frod" with minimum_should_match = 100%, yet I still get every record that matches at least one ngram.
(e.g. rodolpho even though it doesn't contain fro)
GIST: test query and results
note: cross-posted from [discuss.elastic.co]
will make a link later, can't post more than 2 yet : /
(https://discuss.elastic.co/t/ngram-partial-match-limiting-ngram-results-in-multiple-field-query/62526)
I used your settings and mappings to create an index. And you queries seem to be working fine for me. I would suggest doing an explain on one of the "unexpected" documents which is being returned and see why it is being matched and returned with other results.
Here is what I did:
Run the analyze api on your analyzer to see how the query will be split into tokens.
curl -XGET 'localhost:9200/my_index/_analyze' -d '
{
"analyzer" : "ngram_analyzer",
"text" : "frodo"
}'
frodo will be split into 3 tokens with your analyzer.
{
"tokens": [
{
"token": "fro",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
},
{
"token": "rod",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
},
{
"token": "odo",
"start_offset": 0,
"end_offset": 5,
"type": "word",
"position": 0
}
]
}
I indexed 3 documents for testing (only used ngrams field) . Here are the docs:
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 3,
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "2",
"_score": 1,
"_source": {
"ngram": "theodore"
}
},
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 1,
"_source": {
"ngram": "frodo"
}
},
{
"_index": "my_index",
"_type": "my_type",
"_id": "3",
"_score": 1,
"_source": {
"ngram": "rudolpho"
}
}
]
}
}
The first query you mentioned, it matches frodo and theodore, but not rudolpho like you mentioned - which makes sense, since rudolpho does not produce any trigrams which match trigrams from frodo
frodo -> fro, rod, odo
rudolpho -> rud, udo, dol, olp, lph, pho
Using your second query, I get back only frodo (None of the other two) .
{
"took": 5,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.53148466,
"hits": [
{
"_index": "my_index",
"_type": "my_type",
"_id": "1",
"_score": 0.53148466,
"_source": {
"ngram": "frodo"
}
}
]
}
}
I then ran an explain (localhost:9200/my_index/my_type/2/_explain) on other two docs (theodore and rudolpho) and I see this (I have clipped the response)
{
"_index": "my_index",
"_type": "my_type",
"_id": "2",
"matched": false,
"explanation": {
"value": 0,
"description": "Failure to meet condition(s) of required/prohibited clause(s)",
"details": [
{
"value": 0,
"description": "no match on required clause ((ngram:fro ngram:rod ngram:odo)~2)",
"details": [
The above is expected since atleast two out of three tokens from frodo should match.
I am Querying for getting aggregate data based on date_range, like below
"aggs": {
"range": {
"date_range": {
"field": "sold",
"ranges": [
{ "from": "2014-11-01", "to": "2014-11-30" },
{ "from": "2014-08-01", "to": "2014-08-31" }
]
}
}
}
using this I am getting this response
"aggregations": {
"range": {
"buckets": [
{
"key": "2014-08-01T00:00:00.000Z-2014-08-31T00:00:00.000Z",
"from": 1406851200000,
"from_as_string": "2014-08-01T00:00:00.000Z",
"to": 1409443200000,
"to_as_string": "2014-08-31T00:00:00.000Z",
"doc_count": 1
},
{
"key": "2014-11-01T00:00:00.000Z-2014-11-30T00:00:00.000Z",
"from": 1414800000000,
"from_as_string": "2014-11-01T00:00:00.000Z",
"to": 1417305600000,
"to_as_string": "2014-11-30T00:00:00.000Z",
"doc_count": 2
}
]
}
}
but instead of only doc_count, I have also required complete aggregate data that satisfy this range,
is threre any way to get this..please help
It's not clear what other fields you're looking for so I've included a couple of examples.
By nesting another aggs inside your first one, you can ask Elasticsearch to pull back additional values e.g. averages, sums, counts, min, max, stats, etc.
this example query will bring back field_count - a count of instances of myfield
and also return order_count - a sum based on a script.
"aggs": {
"range": {
"date_range": {
"field": "sold",
"ranges": [
{ "from": "2014-11-01", "to": "2014-11-30" },
{ "from": "2014-08-01", "to": "2014-08-31" }
]
}
}
},
"aggs" : {
"field_count": {"value_count" : { "field" : "myfield" } },
"order_count": {"sum" : {"script" : " doc[\"output_msgtype\"].value == \"order\" ? 1 : 0"} } }}
}
If you aren't looking for any sums, counts, averages on your data - then an aggregation isn't going to help.
I would instead run a standard query once per range. e.g.:
curl -XGET 'http://localhost:9200/test/cars/_search?pretty' -d '{
"fields" : ["price", "color", "make", "sold" ],
"query":{
"filtered": {
"query": {
"match_all" : { }
},
"filter" : {
"range": {"sold": {"gte": "2014-09-21T20:03:12.963","lte": "2014-09-24T20:03:12.963"}}}
}
}
}'
repeat this query as needed but modifying the range each time.
I searched alot on this and tried numerous combinations. But failed in all attempts :(.
Here is my problem:
I created a jdbc-river in elastic search as below:
{
"type" : "jdbc",
"jdbc" : {
"driver" : "oracle.jdbc.driver.OracleDriver",
"url" : "jdbc:oracle:thin:#//ip:1521/db",
"user" : "user",
"password" : "pwd",
"sql" : "select f1, f2, f3 from table"
},
"index" : {
"index" : "subject2",
"type" : "name2",
"settings": {
"analysis": {
"analyzer": {
"my_analizer": {
"type": "custom",
"tokenizer": "my_pattern_tokenizer",
"filter": []
}
},
"tokenizer": {
"my_pattern_tokenizer": {
"type": "pattern",
"pattern": "$^"
}
},
"filter": []
}
}
},
"mappings":
{
"subject2":
{
"properties" : {
"f1" : {"index" : "not_analyzed", "store": "yes", "analyzer": "my_analizer", "search_analyzer": "keyword", "type": "string"},
"f2" : {"index" : "not_analyzed", "store": "yes", "analyzer": "my_analizer", "search_analyzer": "keyword", "type": "string"},
"f3" : {"index" : "not_analyzed", "store": "yes", "analyzer": "my_analizer", "search_analyzer": "keyword", "type": "string"}
}
}
}
}
I want to implement an auto-complete feature that matches the user entered value with the data in "f1" field say as of now but from the start.
Data in the f1 field is like
"Hardin County ABC"
"Country of XYZ"
"County of Blah blah"
"County of Blah second"
What is as per requirement is when user types "Coun" then result 2nd, 3rd and 4th should be returned by the elastic search and not the first. I read about "keyword" analyzer that makes the complete word to be token but I don't know not working in this case.
Also, if user types "County of B" then 3rd and 4th option should be returned by the elastic search.
Below is the format of my querying the result.
Option 1
{"from":0,"size":10, "query":{ "field" : { "f1" : "count*" } } }
Option 2
{"from":0,"size":10, "query":{ "span_first" : {
"match" : {
"span_term" : { "COMPANY" : "hardin" }
},
"end" : 1
} } }
Please tell me what wrong I am doing here? Thanks in advance.
Before I answer I want to point out you are defining an analyzer then setting index: not_analyzed which means the analyzer is not used. (If you use not_analyzed it is the same as using the keyword analyzer, the whole string, untouched, is one token.)
Also analyzer: my_analizer is a shortcut for index_analyzer: my_analizer and search_analyzer: my_analizer, so your mapping is a bit confusing to me...
Also the fields will be stored in the _source unless you turn this off, you don't need to store the fields separately unless you turn off the _source storing and need that field returned in the result set.
There are 2 ways I can think of doing this:
1. Use a match_phrase_prefix query - Easier and slow
Don't define any analyzers, you don't need them.
Mapping:
"subject2": {
"properties" : {
"f1" : { "type": "string" },
"f2" : { "type": "string" },
"f3" : { "type": "string" },
}
}
}
Query:
"match_phrase_prefix" : {
"f1" : {
"query" : "Count"
}
}
2. Use an edge_ngram token filter - Harder and faster
"settings": {
"analysis": {
"analyzer": {
"edge_autocomplete": {
"type": "custom",
"tokenizer": "keyword",
"filter": ["my_edge_ngram"]
}
},
"filter" : {
"my_edge_ngram" : {
"type": "edgeNGram",
"min_gram": 2,
"max_gram": 15
}
}
}
}
Mapping:
"subject2": {
"properties" : {
"f1" : { "type": "string", "index": "edge_autocomplete" },
"f2" : { "type": "string", "index": "edge_autocomplete" },
"f3" : { "type": "string", "index": "edge_autocomplete" },
}
}
}
Query:
"match" : {
"f1" : "Count",
"analyzer": "keyword"
}
Good luck!
Have you tried an ngram filter? It will tokenize strings of character-length "n". So, your mapping could look like:
{
"settings": {
"analysis": {
"analyzer": {
"autocomplete": {
"type": "custom",
"tokenizer": "standard",
"filter": ["standard", "lowercase", "kstem", "ngram"]
}
},
"filter" : {
"ngram" : {
"type": "ngram",
"min_gram": 2,
"max_gram": 15
}
}
}
},
"mappings": {
"subject2": {
"properties" : {
"f1" : {
"type": "multi_field",
"fields": {
"f1": {
"type": "string"
},
"autocomplete": {
"analyzer": "autocomplete",
"type": "string"
},
...
This will return the ngram "count" for the 2nd, 3rd, and 4th results, which should give you the desired outcome.
Note that making "f1" a multi_field field is not required. However, when you don't need the "autocomplete" analyzer, such as when returning "f1" in the search results, then it is less expensive to use the "f1" subfield. If you do use a "multi_field", you can access "f1" at "f1" (without dot notation), but to access "autocomplete" you need to use dot notation - so "f1.autocomplete".
Although, The solution we final implemented is a mix of approaches but still answer by "ramseykhalaf" is the closest match. +1 to him.
What I did when ever user enters a word with space fire a match-prefix query and get the closest match result to show.
{"from":0,"size":10, "query":{ "match" : { "f1" : {"query" : "MICROSOU", "type" : "phrase_prefix", "boost":2} } } }
As soon as user hits any character after space I change the mode of query to query field with regex and being multiple words in a field match is again very close to what user is looking for.
{"from":0,"size":10, "query":{ "query_string" : { "default_field":"f1","query" : "micro int*", "boost":2 } } }
In this way we got the closest solution to this requirement. I would be happy to get more optimize solution that suffice my above mentioned use cases.
Just to add one more thing - now the river I created is simple plain vanilla with fields as "not_analyzed" and analyzer as "keyword"