Elasticsearch wildcard query string with fuzziness

Elasticsearch wildcard query string with fuzziness - search

We have an index of items with which I'm attempting to do fuzzy wildcard on the items name.
the query
{
"from": 0,
"size": 10,
"query": {
"bool": {
"must": {
"query_string": {
"fields": [
"name.suggest"
],
"query": "avacado*",
"fuzziness": 0.7
}
}
}
}
}
the field in the index and the analyzers at play
"
suggest_analyzer":{
"type": "custom",
"tokenizer": "standard",
"filter": ["standard", "lowercase", "shingle", "punctuation"]
}
"punctuation" : {
"type" : "word_delimiter",
"preserve_original": "true"
}
"name": {
"fields": {
"name": {
"type": "string",
"analyzer": "stem"
},
"suggest":{
"type": "string",
"analyzer": "suggest_analyzer"
},
"untouched": {
"include_in_all": false,
"index": "not_analyzed",
"index_options": "docs",
"omit_norms": true,
"type": "string"
},
"untouched_lowercase": {
"type": "string",
"index_analyzer": "lowercase",
"search_analyzer": "lowercase"
}
},
"type": "multi_field"
},
The problem is this
An item with the name "Avocado Test" will match for the following
avocado*
avo*
avacado
but fails to match for
avacado*
ava*
ava~2
I cant seem to make fuzzy work with wildcards, it seems to be either fuzzy works or wildcards work but not in combination.
Es version is 1.3.1
Note that my query is simplified and we have other filtering going on but I boiled it down to just the query to take any ambiguity out of the results. I've attempted to use the suggest features but they won't allow the level of filtering we need.
Is there any other way to handle doing suggest/typeahead style searching with fuzziness to catch misspellings?

You could try EdgeNgramTokenFilter, use it on a analyzer applied on the desired field and do a fuzzy search on it.

Related

Speeding up Cloudant query for type text index

We have a table with this type of structure:
{_id:15_0, createdAt: 1/1/1, task_id:[16_0, 17_0, 18_0], table:”details”, a:b, c: d, more}
We created indexes using
{
"index": {},
"name": "paginationQueryIndex",
"type": "text"
}
It auto created
{
"ddoc": "_design/28e8db44a5a0862xxx",
"name": "paginationQueryIndex",
"type": "text",
"def": {
"default_analyzer": "keyword",
"default_field": {
},
"selector": {
},
"fields": [
],
"index_array_lengths": true
}
}
We are using the following query
{
"selector": {
"createdAt": { "$gt": 0 },
"task_id": { "$in": [ "18_0" ] },
"table": "details"
},
"sort": [ { "createdAt": "desc" } ],
"limit”: 20
}
It takes 700-800 ms for first time, after that it decreases to 500-600 ms
Why does it take longer the first time?
Any way to speed up the query?
Any way to add indexes to specific fields if type is “text”? (instead of indexing all the fields in these records)

You could try creating the index more explicitly, defining the type of each field you wish to index e.g.:
{
"index": {
"fields": [
{
"name": "createdAt",
"type": "string"
},
{
"name": "task_id",
"type": "string"
},
{
"name": "table",
"type": "string"
}
]
},
"name": "myindex",
"type": "text"
}
Then your query becomes:
{
"selector": {
"createdAt": { "$gt": "1970/01/01" },
"task_id": { "$in": [ "18_0" ] },
"table": "details"
},
"sort": [ { "createdAt": "desc" } ],
"limit": 20
}
Notice that I used strings where the data type is a string.
If you're interested in performance, try removing clauses from your query one at-a-time to see if one is causing the performance problem. You can also look at the explanation of your query to see if it using your index correctly.
Documentation on creating an explicit text query index is here

Elastic search mapping using node.js

i have created a elastic search index using below DSL query -
it is already created manually but i am trying to index data using mongosastic with node.js. I am using synchronize method to index my mongodb collection to elastic search. what should be my nodejs mapping code so that it can be indexed properly ?
{
"settings": {
"number_of_shards": 1,
"analysis": {
"filter": {
"ngram_filter": { // ngrams analyzers
"type": "ngram",
"min_gram": 2,
"max_gram": 20
}
},
"analyzer": {
"ngram_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"ngram_filter"
]
}
}
}
},
"mappings": {
"employees": {
"_all": {
"type": "string",
"index_analyzer": "ngram_analyzer",
"search_analyzer": "standard"
},
"properties": { // schema start
"FirstName": {
"type": "string",
"include_in_all": true,
"term_vector": "yes",
"index_analyzer": "ngram_analyzer",
"search_analyzer": "standard"
} // it has more fiels as given in schema below
} // schema end
}
}
}
my mongodb collection schema is -
{
"FirstName": "MISTI",
"LastName": "RAMSTAD",
"Designation": "CEO",
"Salary": "148000",
"DateOfJoining": "23/09/1997",
"Address": "32 Pawnee Ave. San Pablo, CA 94806",
"Gender": "Female",
"Age": 55,
"MaritalStatus": "Unmarried",
"I`enter code here`nterests": "Letterboxing,Scuba Diving,Mountain Biking,Handwriting Analysis,Models"
}

you can see the below answer, you can create the index with settings and mapping from your index.ts file when the server start.
also if you want to update your mapping just make your update and restart the server.
Elastic Search when to add dynamic mappings

Elastic Search: Dynamic Template Mapping for Geo Point Field

Is dynamic mapping for geo point still working in Elastic Search 2.x/5.x?
This is the template:
{
"template": "*",
"mappings": {
"_default_": {
"dynamic_templates": [
{
"geo_point_type": {
"match_mapping_type": "string",
"match": "t_gp_*",
"mapping": {
"type": "geo_point"
}
}
}
]
}
}
}
This is the error I get when I query the field:
"reason": "failed to parse [geo_bbox] query. field [t_gp_lat-long#en] is expected to be of type [geo_point], but is of [string] type instead"
I seems to remember that I saw somewhere in the documentation that this doesn't work, but I thought that's only when there is no dynamic template at all.
Any idea?
Update 1
Here's a sample of the document. The actual document is very big so I took the only relevant part of it.
{
"_index": "route",
"_type": "route",
"_id": "583a014edd76239997fca5e4",
"_score": 1,
"_source": {
"t_b_highway#en": false,
"t_n_number-of-floors#en": 33,
"updatedBy#_id": "58059fe368d0a739916f0888",
"updatedOn": 1480196430596,
"t_n_ceiling-height#en": 2.75,
"t_gp_lat-long#en": "13.736248,100.5604997"
}
}
Data looks correct to me since you can also index Geo Point field with lat/long string.
Update 2
Mapping is definitely wrong. That's why I'm wondering if you can dynamically map Geo Point field.
"t_gp_lat-long#en": {
"type": "string",
"fields": {
"english": {
"type": "string",
"analyzer": "english"
},
"raw": {
"type": "string",
"index": "not_analyzed",
"ignore_above": 256
}
}
},

Prioritize exact words first and then others while using query_string query in elasticsearch

If I search for a term "Liebe" the current query and the analyzers used, returns me the results containing the word "Liebe" as a part of a different word "Verlieben" are prioritized over those with only this word.
It should be the other way.
I am also using some advance filters and aggregations too. But here is the most basic query that I use to search.
{
"query": {
"query_string": {
"query": "Liebe",
"default_operator": "AND",
"analyzer": "my_analyzer1"
}
},
"size": "10",
"from": 0
}
The analyzers and index settings are as follows:
{
"settings": {
"analysis": {
"filter": {
"nGram_filter": {
"type": "nGram",
"min_gram": 2,
"max_gram": 20,
"token_chars": [
"letter",
"digit",
"punctuation",
"symbol"
]
}
},
"analyzer": {
"nGram_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding",
"nGram_filter"
]
},
"whitespace_analyzer": {
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"asciifolding"
]
},
"my_analyzer1":{
"type": "custom",
"tokenizer": "whitespace",
"filter": [
"lowercase",
"my_stop'.$s_id.'",
"asciifolding"
]
}
}
}
},
"mappings": {
"product": {
"_all": {
"index_analyzer": "nGram_analyzer",
"search_analyzer": "whitespace_analyzer"
},
"properties": {
'.$mapping.'
}
}
}
}
Please ignore the $mapping. These are the dynamic fields that reside in the index based on some settings in my framework.
Can anyone please point me some direction where I don't need to change more and can get the what I mentioned above?
I have checked many things like match query n all. But, I don't have any fields fixed. So,I cant use that. And I want both the exact search and the search results which has partial match(Using nGrams).
Please help!
Thanks!

Searching by name on ElasticSearch

Say I have an index with thousands of names of clients and I need to be able to easily search them in a administration panel, like this:
John Anders
John Smith
Sarah Smith
Bjarne Stroustrup
I want to have full search capabilities on it, which means that:
If I search for John, I should get the John Anders and John Smith.
If I search for Smith, I should get the Smith's couple.
If I search for sarasmit or sara smit, I should get Sarah Smith as I searched for the initials of the name and the whitespace doesn't matter.
If I search for johers or joh ers, I should get John Anders as I searched for strings contained in the name.
I already figured out that I could use an analyser with lowercase filter and a keyword tokenizer but they don't work for every case.
What is the right combination of tokenizers/analysers/queries to use?

Have a look at this, this is a question I asked regarding a similar data set. Here is a look at the index settings/mapping I have used to produce some decent results. Development has ceased on this for the interim however this is what I've produced so far. You can then develop your queries -
{
"settings": {
"number_of_shards": 5,
"number_of_replicas": 0,
"analysis": {
"filter": {
"synonym": {
"type": "synonym",
"synonyms_path": "synonyms/synonyms.txt"
},
"my_metaphone": {
"type": "phonetic",
"encoder": "metaphone",
"replace": false
}
},
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": [
"lowercase",
"synonym"
]
},
"metaphone": {
"tokenizer": "standard",
"filter": [
"my_metaphone"
]
},
"porter": {
"tokenizer": "standard",
"filter": [
"lowercase",
"porter_stem"
]
}
}
}
},
"mappings": {
"mes": {
"_all": {
"enabled": false
},
"properties": {
"pty_forename": {
"type": "multi_field",
"store": "yes",
"fields": {
"pty_forename": {
"type": "string",
"analyzer": "simple"
},
"metaphone": {
"type": "string",
"analyzer": "metaphone"
},
"porter": {
"type": "string",
"analyzer": "porter"
},
"synonym": {
"type": "string",
"analyzer": "synonym"
}
}
},
"pty_full_name": {
"type": "string",
"index": "not_analyzed",
"store": "yes"
},
"pty_surname": {
"type": "multi_field",
"store": "yes",
"fields": {
"pty_surname": {
"type": "string",
"analyzer": "simple"
},
"metaphone": {
"type": "string",
"analyzer": "metaphone"
},
"porter": {
"type": "string",
"analyzer": "porter"
},
"synonym": {
"type": "string",
"analyzer": "synonym"
}
}
}
}
}
}
}'
Note I have used the phonetic plugin and also I have a comprehensive list of synonyms which is critical for returning results for richard when dick is entered.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Elasticsearch wildcard query string with fuzziness - search

You could try EdgeNgramTokenFilter, use it on a analyzer applied on the desired field and do a fuzzy search on it.

Related

Speeding up Cloudant query for type text index

Elastic search mapping using node.js

Elastic Search: Dynamic Template Mapping for Geo Point Field

Prioritize exact words first and then others while using query_string query in elasticsearch

Searching by name on ElasticSearch

Categories

Resources