elasticsearch doesn't find results when searching the exact term - node.js

I am using the elasticsearch module in my nodejs app to query my index using fuzzy completion. The text I'm trying to search is Rome–Fiumicino Leonardo da Vinci International Airport. when searching this term I get no results, but if I cut the term to 50 characters it does find it and return results.
const result = await elasticsearch.search({
index: 'myIndex',
body: {
suggest: {
fuzzinessZero: {
text,
completion: {
field: 'name_suggest',
fuzzy: {
fuzziness: 0,
},
contexts,
},
},
fuzzinessOne: {
text,
completion: {
field: 'name_suggest',
fuzzy: {
fuzziness: 1,
},
contexts,
},
},
fuzzinessTwo: {
text,
completion: {
field: 'name_suggest',
fuzzy: {
fuzziness: 2,
},
contexts,
},
},
},
}
})
This is the result I get in fuzzinessOne
As you can see, the result in the text field is cut to 50 characters (maybe that's the issue). And inside the _source I get back all the inputs which is used for the search, and one of them is the full exact term which I tried to search, as well with all the other available combinations available.
It is worth mentioning that I'm using AWS openSearch.
And this is the settings which I use to create the index:
settings: {
analysis: {
filter: {
autocomplete_filter: {
type: 'edge_ngram',
min_gram: 2,
max_gram: 20,
},
shingle_filter: {
type: 'shingle',
max_shingle_size: 3,
},
},
analyzer: {
autocomplete: {
type: 'custom',
tokenizer: 'standard',
filter: ['lowercase', 'shingle_filter', 'asciifolding'],
},
},
},
}

You are facing this issue because of default value of max_input_length parameter is set to 50.
Below is description given for this parameter in documentation:
Limits the length of a single input, defaults to 50 UTF-16 code
points. This limit is only used at index time to reduce the total
number of characters per input string in order to prevent massive
inputs from bloating the underlying datastructure. Most use cases
won’t be influenced by the default value since prefix completions
seldom grow beyond prefixes longer than a handful of characters.
You can use this default behaviour or you can updated your index mapping with increase value of max_input_length parameter and reindex your data.
{
"mappings": {
"dynamic": "false",
"properties": {
"namesuggest": {
"type": "completion",
"analyzer": "keyword_lowercase_analyzer",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 100,
"contexts": [
{
"name": "searchable",
"type": "CATEGORY"
}
]
}
}
},
"settings": {
"index": {
"mapping": {
"ignore_malformed": "true"
},
"refresh_interval": "5s",
"analysis": {
"analyzer": {
"keyword_lowercase_analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "keyword"
}
}
},
"number_of_replicas": "0",
"number_of_shards": "1"
}
}
}

Related

MongoDB query to find in nested schema

This query is returning the first object but it should not return. Because it has the BU but in different domain. Its doing fine in single objects in collaborators. When there is multiple Its not behaving as expected. How can we do this any suggestions?
My criteria is In the collaborator array
Only BU name or
Only Domain or
Both BU and Domain it should return.
In below situation first one has same domain <{"domain": "xyz.com"}> but still its not returning. Why?
[
{
name: "1",
collaborators: [
{
"domain": "xyz.com"
},
{
"buName": "Vignesh B"
},
{
"domain": "yz.com"
},
{
"domain": "xyz.com",
"buName": "Vignesh B"
}
]
},
{
name: "2",
collaborators: [
{
"domain": "xyz.com",
"buName": "Vignesh BU"
}
]
},
{
name: "3",
collaborators: [
{
"domain": "xyz.com"
}
]
},
{
name: "4",
collaborators: [
{
"buName": "Vignesh BU"
},
{
"domain": "xyz.com"
},
{
"domain": "xyz.com",
"buName": "Vignesh BU"
}
]
}
]
db.collection.find({
$or: [
{
"collaborators.domain": "xyz.com",
"collaborators.buName": {
"$exists": false
}
},
{
"collaborators.buName": "Vignesh BU",
"collaborators.domain": {
"$exists": false
}
},
{
"collaborators.buName": "Vignesh BU",
"collaborators.domain": "xyz.com"
}
]
})
It is not returning the first document because the buName values in this document are "Vignesh B" and not "Vignesh BU". Only add an U in Vignesh B and it works.
Link to mongodb playground
I think there was a comment at wone point that said that the name: "1" document was expected to return (as it matches the second "Only Domain" criteria) but it is not currently. This is because you will need to use the $elemMatch operator since you are querying an array with multiple conditions.
The query should look as follows, as demonstrated in this playground example (note that I've changed the name: 3 document so that it would not match):
db.collection.find({
$or: [
{
"collaborators": {
$elemMatch: {
"domain": "xyz.com",
"buName": {
"$exists": false
}
}
}
},
{
"collaborators": {
$elemMatch: {
"buName": "Vignesh BU",
"domain": {
"$exists": false
}
}
}
},
{
"collaborators": {
$elemMatch: {
"buName": "Vignesh BU",
"domain": "xyz.com"
}
}
}
]
})
Why is this change needed? It is because of the semantics of how querying an array works in MongoDB. When querying on multiple nested conditions without using $elemMatch you are telling the database that different entries in the array can each individually satisfy the requirements. As shown in this playground example, that means that when you run this query:
db.collection.find({
"arr.str": "abc",
"arr.int": 123
})
The following document will match:
{
_id: 1,
arr: [
{
str: "abc"
},
{
int: 123
}
]
}
This is because the first entry in the array satisfies one of the query predicates while the other entry in the array satisfies the second predicate. Changing the query to use $elemMatch changes the semantics to specify that a single entry in the array must successfully satisfy all query predicate conditions which prevents the document above from matching.
In your specific situation the same thing was happening with your first set of conditions of:
{
"collaborators.domain": "xyz.com",
"collaborators.buName": {
"$exists": false
}
}
The first array item in the name: "1" document was matching the collaborators.domain condition. The problem was the second condition. While that same first array entry did not have a buName field, two of the other entries in the array did. Since there is no $elemMatch present, the database checked those other entries, found that the buName existed there, and that caused the query predicates to fail to match and for the document to not get returned. Adding the $elemMatch forces both of those checks to happen against the single entry in the array hence resolving the issue.

Query Doesn't Match Numbers In Text

Match queries can find strings that contain numbers, in this case, I am trying to search matching phone numbers. Mappings and analyzers are provided below. For example, I have an index as follows
{
"userId": 126817,
"name": "Test User",
"phoneNumber": "5551112233",
}
When I use match query doesn't match anything
{"match" : {"phoneNumber": {"query": "555"}}}
When I use prefix value it does match
{"prefix" : {"phoneNumber": {"value ": "555"}}}
Analyze Results
{
"tokens": [
{
"token": "5551112233",
"start_offset": 0,
"end_offset": 10,
"type": "<NUM>",
"position": 0
}
]
}
Mapping
{
index: "user-clinics",
type: "user-clinic",
body: {properties: {id: {type: "long"}} }
}
Analyzers
const TurkishAnalyzer = {
analysis: {
filter: {
my_ascii_folding: {
type: "asciifolding",
preserve_original: true
}
},
analyzer: {
turkish_analyzer: {
tokenizer: "standard",
filter: ["lowercase", "my_ascii_folding"]
}
}
}
};
const AutoCompleteAnalyzer = {
analysis: {
filter: {
autocomplete_filter: {
type: "edge_ngram",
min_gram: 1,
max_gram: 20
}
},
analyzer: {
autocomplete_search: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase"]
},
autocomplete_index: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase", "autocomplete_filter"]
}
}
}
};
It's because edge_ngram tokenizes only from the beginning of the token, hence all prefixes will be indexed, i.e. a, as, asd, asd1, asd12, asd123
You need to change your autocomplete_filter to ngram if you also want to be able to match inside tokens, i.e. d12 or 123.
Beware, though, that this might generate a lot more tokens

search multiple field as regexp query in elasticsearch

I am trying to search by different fields such as title and description. When i type keywords, elasticseach must found something if description or title includes that i typed keywords. This is my goal. How can i reach my goal?
You can see the sample code that i used for one field.
query: {
regexp: {
title: `.*${q}.*`,
},
},
I also tried below one but it gave syntax error.
query: {
regexp: {
title: `.*${q}.*`,
},
regexp: {
description: `.*${q}.*`,
},
},
To do so, you need to use a bool query.
GET /<you index>/_search
{
"query": {
"bool": {
"should": [
{
"regexp": {
"title": ".*${q}.*"
}
},
{
"regexp": {
"description": ".*${q}.*"
}
}
]
}
}
}
You can find the documentation => [doc]

Solr to list partial matches at top-different scenario

I have performed search against the field company (**which is ngram-d as i need to fetch results against partial match)** with the search text "aetnahmo" . I could bring exact matches and partial matches very top.
I need to handle a scenario such as:
Example: From the below results, I need to bring "AETNA BETTER HLTH PAHMO" and "AETNA BETTER HEALTH MAHMO" at the top of "CIGNAHMO HEALTHPLAN - METH".
Here,even these results do not have 'aetnahmo' it has 'aetna'. I need to display results starts with this, below the exact matches and similar matches.
"docs": [
{
"company": "AETNAHMOGNPIPA",
"score": 0.32741508
},
{
"company": "AETNAHMOPOSOUT OF NETWORK",
"score": 0.32741508
},
{
"company": "CIGNAHMO HEALTHPLAN - METH",
"score": 0.14788051
},
{
"company": "CIGNAHMOPOSOZ08",
"score": 0.14500062
},
{
"company": "CIGNAHMOPOSGNPIPA",
"score": 0.14500062
},
{
"company": "HUMANAHMO MCD",
"score": 0.14500062
},
{
"company": "AETNA BETTER HLTH PAHMO",
"score": 0.1069743
},
{
"company": "AETNA BETTER HEALTH MAHMO",
"score": 0.1069743
},
{
"company": "MOLINA HLTHCARE IL PAHMO",
"score": 0.067287326
},
{
"company": "BCBSMAHMO OUTPT",
"score": 0.065203
}
]
Is there a way to achieve this. Please help
Phrase boosting will work here.
You'll need to use the edismax query parser along with the pf field.
The following params appended to your query should do the trick:
&defType=edismax&pf=company.
I've tested this out on Solr-5.4.1 with the dataset that you've posted above and the results are as follows:
Query: http://localhost:8983/solr/Test/select?q=company%3Aaetnamho&wt=json&indent=true&defType=edismax&pf=company&stopwords=true&lowercaseOperators=true&omitHeader=true
Response:
{
"response":{"numFound":9,"start":0,"docs":[
{
"company":"AETNAHMOPOSOUT OF NETWORK",
"id":2,
"_version_":1530885533005250560},
{
"company":"AETNA BETTER HEALTH MAHMO",
"id":8,
"_version_":1530885600368918528},
{
"company":"AETNA BETTER HLTH PAHMO",
"id":7,
"_version_":1530885592734236672},
{
"company":"AETNAHMOGNPIPA",
"id":1,
"_version_":1530885512290631680},
{
"company":"CIGNAHMO HEALTHPLAN - METH",
"id":3,
"_version_":1530885543046414336},
{
"company":"MOLINA HLTHCARE IL PAHMO",
"id":9,
"_version_":1530885608894889984},
{
"company":"CIGNAHMOPOSGNPIPA",
"id":5,
"_version_":1530885565434560512},
{
"company":"CIGNAHMOPOSOZ08",
"id":4,
"_version_":1530885555631423488},
{
"company":"HUMANAHMO MCD",
"id":6,
"_version_":1530885585061806080}]
}}

ElasticSearch MultiField Search Query

I have an endpoint that I am proxying into ElasticSearch API for a simple user search I am conducting.
/users?nickname=myUsername&email=myemail#gmail.com&name=John+Smith
Somet details about these parameters are the following
All parameters are optional
nickname can be searched as a full text search (i.e. 'myUser' would return 'myUsername')
email must be an exact match
name can be searched as full text search for each token (i.e. 'john' would return 'John Smith')
The ElasticSearch search call should treat the parameters collectively as AND'd.
Right now, I am not truly sure where to start as I am able to execute the query on each of the parameters alone, but not all together.
client.search({
index: 'users',
type: 'user',
body: {
"query": {
//NEED TO FILL THIS IN
}
}
}).then(function(resp){
//Do something with search results
});
First you need to create the mapping for this particular use case.
curl -X PUT "http://$hostname:9200/myindex/mytype/_mapping" -d '{
"mytype": {
"properties": {
"email": {
"type": "string",
"index": "not_analyzed"
},
"nickname": {
"type": "string"
},
"name": {
"type": "string"
}
}
}
}'
Here by making email as not_analyzed , you are making sure only the exact match works.
Once that is done , you need to make the query.
As we have multiple conditions , it would be a good idea to use bool query.
You can combine multiple queries and how to handle them using bool query
Query -
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "qbox"
}
},
{
"prefix": {
"nickname": "qbo"
}
},
{
"match": {
"email": "me#qbox.io"
}
}
]
}
}
}
Using the prefix query , you are telling Elasticsearch that even if the token starts with qbo , qualify it as a match.
Also prefix query might not be very fast , in that case you can go for ngram analyzer - http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/analysis-ngram-tokenizer.html

Resources