I'm building a search engine to find places from a huge Database stored in elasticsearch and want my results to be based on the nearest places from the user's position so I used the completion suggester method with its context option, but I'm facing a problem in the implementation.
I followed the documentation step by step but it always returns an empty array.
Here's how I'm creating my index:
`location: {
type: "geo_point"
},
context2: {
type: "completion",
analyzer: "my_analyzer",
contexts: {
name: "location",
type: "geo",
precision: 4
}
},
and how I'm performing my search
contextSuggester: {
prefix: req.body['q'],
completion: {
field: "context2",
size : 7,
skip_duplicates:true,
contexts: {
location: {
lat: 43.662,
lon: -79.380
}
},
fuzzy: {
fuzziness: "auto"
}
}
}
Related
I am using the elasticsearch module in my nodejs app to query my index using fuzzy completion. The text I'm trying to search is Rome–Fiumicino Leonardo da Vinci International Airport. when searching this term I get no results, but if I cut the term to 50 characters it does find it and return results.
const result = await elasticsearch.search({
index: 'myIndex',
body: {
suggest: {
fuzzinessZero: {
text,
completion: {
field: 'name_suggest',
fuzzy: {
fuzziness: 0,
},
contexts,
},
},
fuzzinessOne: {
text,
completion: {
field: 'name_suggest',
fuzzy: {
fuzziness: 1,
},
contexts,
},
},
fuzzinessTwo: {
text,
completion: {
field: 'name_suggest',
fuzzy: {
fuzziness: 2,
},
contexts,
},
},
},
}
})
This is the result I get in fuzzinessOne
As you can see, the result in the text field is cut to 50 characters (maybe that's the issue). And inside the _source I get back all the inputs which is used for the search, and one of them is the full exact term which I tried to search, as well with all the other available combinations available.
It is worth mentioning that I'm using AWS openSearch.
And this is the settings which I use to create the index:
settings: {
analysis: {
filter: {
autocomplete_filter: {
type: 'edge_ngram',
min_gram: 2,
max_gram: 20,
},
shingle_filter: {
type: 'shingle',
max_shingle_size: 3,
},
},
analyzer: {
autocomplete: {
type: 'custom',
tokenizer: 'standard',
filter: ['lowercase', 'shingle_filter', 'asciifolding'],
},
},
},
}
You are facing this issue because of default value of max_input_length parameter is set to 50.
Below is description given for this parameter in documentation:
Limits the length of a single input, defaults to 50 UTF-16 code
points. This limit is only used at index time to reduce the total
number of characters per input string in order to prevent massive
inputs from bloating the underlying datastructure. Most use cases
won’t be influenced by the default value since prefix completions
seldom grow beyond prefixes longer than a handful of characters.
You can use this default behaviour or you can updated your index mapping with increase value of max_input_length parameter and reindex your data.
{
"mappings": {
"dynamic": "false",
"properties": {
"namesuggest": {
"type": "completion",
"analyzer": "keyword_lowercase_analyzer",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 100,
"contexts": [
{
"name": "searchable",
"type": "CATEGORY"
}
]
}
}
},
"settings": {
"index": {
"mapping": {
"ignore_malformed": "true"
},
"refresh_interval": "5s",
"analysis": {
"analyzer": {
"keyword_lowercase_analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "keyword"
}
}
},
"number_of_replicas": "0",
"number_of_shards": "1"
}
}
}
So basically i have model with a bunch of string fields like so:
const Schema: Schema = new Schema(
{
title: {
type: String,
trim: true
},
description: {
type: String,
trim: true
},
...
}
);
Schema.index({ '$**': 'text' });
export default mongoose.model('Watch', Schema);
where I index all of them.
Now when I search being that this schema is used as a ref for another model I do a search like this where user is an instance of the other model
const { search, limit = 5 } = req.query;
const query = search && { match: { $text: { $search: new RegExp(search, 'i') } } };
const { schemaRes } = await user
.populate({
path: 'schema',
...query,
options: {
limit
}
})
.execPopulate();
and the searching itself seems to work ok, the problem is when search fields starts to be more specific it seems to me the it does not regard it well.
Example
db
{ title: 'Rolex', name: 'Submariner', description: 'Nice' }
{ title: 'Rolex', name: 'Air-King', description: 'Nice' }
When the search param is Rolex I get both items which is ok but when the search param becomes Rolex Air-King i keep on getting both items which to me is not ok because I would rather get only one.
Is there something I could do to achieve this?
Returning both items is correct, since both items match your search params, but with different similarity score.
You can output the similarity score to help sorting the result.
user.aggregate([
{ $match: { $text: { $search: "Rolex Air-King" } } },
{ $set: { score: { $meta: "textScore" } } }
])
// new RegExp("Rolex Air-King", 'i') is not necessary and even invalid,
// as $search accepts string and is already case-insensitive by default
The query will return
[{
"_id": "...",
"title": "Rolex",
"name": "Air-King",
"description": "Nice",
"score": 2.6
},
{
"_id": "....",
"title": "Rolex",
"name": "Submariner",
"description": "Nice",
"score": 1.1
}]
Since the second result item matches your search query (even partially), MongoDB returns it.
You could use the score to help sort the items. But determining the right threshold to filter the result is complex, as the score depends on the word count as well.
On a side note: You can assign different weights to the fields if they are not equally important
https://docs.mongodb.com/manual/tutorial/control-results-of-text-search/
I have tons of articles in various stores. Some of these articles are own brand articles and should be ranked higher than other articles in my elasticsearch search results (both ownbrand and non ownbrand should be shown however.)
I already tried different approached with field_value_factor but that doesn't seem to go well with a boolean field.
I also tried the approached solution in Boosting an elasticsearch result based on a boolean field value but that didn't worked well for me. The results with the ownBrand approach were still way lower ranked then a lot of non ownBrand articles.
Index:
schema: {
articleId: { type: 'text' },
brandId: { type: 'text' },
brandName: { type: 'text' },
countryId: { type: 'text' },
description: { type: 'text' },
isOwnBrand: { type: 'boolean' },
stores: { type: 'keyword' },
},
};
Query:
query: {
function_score: {
query: {
bool: {
must: {
multi_match: {
query: searchterm,
fields: ['name^5', 'name.ngram'],
fuzziness: 'auto',
prefix_length: 2,
},
},
filter: [{ term: { stores: storeId } }],
},
},
},
},
};
The result should prioritize fields with isOwnBrand = true at the top while still showing relevant articles with isOwnBrand = false below.
I am a bit lost on how to handle this.
You can use Field Value factor. Below should work fine even on a boolean field as well. try it
{
"query": {
"function_score": {
"query" {...}, #your normal query as in question
#add below after query
"field_value_factor": {
"field": "isOwnBrand",
"factor": 1.2,
"modifier": "sqrt",
"missing": 1
}
}
}
}
One caveat i can think of but haven't tested - since false is 0, above script will score down all documents with false to 0 score, which messes up scoring. You could either make the isOwnBrand a number field and set priority starting 1
OR you could also use script_score
I'm setting up a nodeJS GraphQL API and I'm experimenting a blocking point regarding one of my resource output type.
The feature is a form that contain three different level :
Level 1- formTemplate
Level 2- formItems (templateId, type (video, image, question) - 1-N relation with formTemplate)
Level 3- formQuestions (0-1 relation with formItem if and only if formItems.type is 'question')
My GraphQL resource is returning all the templates in the database so it's an array that for each template is returning all his items and each item of type "question" needs to return an array containing the associated question.
My problem is : I really don't know how to return an empty object type for the formItems where type is different from "question" or if there is a better approach for this kind of situation
I've tried to look at GraphQL directives and inline fragments but I think it really needs to be manage by the backend side because it's transparent for the API consumer.
const formTemplate = new GraphQLObjectType({
name: 'FormTemplate',
fields: () => {
return {
id: {
type: new GraphQLNonNull(GraphQLInt)
},
authorId: {
type: new GraphQLNonNull(GraphQLInt)
},
name: {
type: new GraphQLNonNull(GraphQLString)
},
items: {
type: new GraphQLList(formItem),
resolve: parent => FormItem.findAllByTemplateId(parent.id)
}
}
}
})
const formItem = new GraphQLObjectType({
name: 'FormItem',
fields: () => {
return {
id: {
type: new GraphQLNonNull(GraphQLInt)
},
templateId: {
type: new GraphQLNonNull(GraphQLInt)
},
type: {
type: new GraphQLNonNull(GraphQLString)
},
question: {
type: formQuestion,
resolve: async parent => FormQuestion.findByItemId(parent.id)
}
}
}
})
const formQuestion= new GraphQLObjectType({
name: 'FormQuestion',
fields: () => {
return {
id: {
type: new GraphQLNonNull(GraphQLInt)
},
itemId: {
type: new GraphQLNonNull(GraphQLInt)
},
type: {
type: new GraphQLNonNull(GraphQLString)
},
label: {
type: new GraphQLNonNull(GraphQLString)
}
}
}
})
My GraphQL request :
query {
getFormTemplates {
name
items {
type
question {
label
type
}
}
}
}
What I'm expected is
{
"data": {
"getFormTemplates": [
{
"name": "Form 1",
"items": [
{
"type": "question",
"question": {
"label": "Question 1",
"type": "shortText"
},
{
"type": "rawContent"
"question": {}
}
]
}
]
}
}
I'd design your "level 2" items so that the "type" property corresponded to actual GraphQL types, implementing a common interface. Also, in general, I'd design the schema so that it had actual links to neighboring items and not their identifiers.
So if every form item possibly has an associated template, you can make that be a GraphQL interface:
interface FormItem {
id: ID!
template: FormTemplate
}
Then you can have three separate types for your three kinds of items
# Skipping VideoItem
type ImageItem implements FormItem {
id: ID!
template: FormTemplate
src: String!
}
type QuestionItem implements FormItem {
id: ID!
template: FormTemplate
questions: [FormQuestion!]!
}
The other types you describe would be:
type FormTemplate {
id: ID!
author: Author!
name: String!
items: [FormItem!]!
}
type FormQuestion {
id: ID!
question: Question
type: String!
label: String!
}
The other tricky thing is, since not all form items are questions, you have to specifically mention that you're interested in questions in your query to get the question-specific fields. Your query might look like
query {
getFormTemplates {
name
items {
__typename # a GraphQL builtin that gives the type of this object
... on Question {
label
type
}
}
}
}
The ... on Question syntax is an inline fragment, and you can similarly use it to pick out the fields specific to other kinds of form items.
Thank you David for your answer !
I've figured it out how to solve my problem using inline fragments and UnionTypes that seems to be the most adapted for this use case. Here is the code :
const formItemObjectType = new GraphQLUnionType({
name: 'FormItemObject',
types: [formItemContent, formItemQuestion],
resolveType(parent) {
switch (parent.type) {
case ('question'): return formItemQuestion
default: return formItemContent
}
}
})
and the GraphQL query using inline fragment:
query {
getFormTemplates {
name
items {
...on FormItemContent {
type,
meta
}
...on FormItemQuestion {
type,
meta,
question {
label
}
}
}
}
}
How to search through multiple fields with elasticsearch? I've tried many queries but none of them worked out. I want the search to be case insensitive and one field is more important than the other. My query looks like this:
const eQuery = {
query: {
query_string: {
query: `*SOME_CONTENT_HERE*`,
fields: ['title^3', 'description'],
default_operator: 'OR',
},
},
}
esClient.search(
{
index: 'movies',
body: eQuery,
},
function(error, response) {
},
)
Mapping looks like this:
{
mappings: {
my_index_type: {
dynamic_templates: [{ string: { mapping: { type: 'keyword' }, match_mapping_type: 'string' } }],
properties: {
created_at: { type: 'long' },
description: { type: 'keyword' },
title: { type: 'keyword' },
url: { type: 'keyword' },
},
},
_default_: {
dynamic_templates: [{ string: { mapping: { type: 'keyword' }, match_mapping_type: 'string' } }],
},
},
}
The problem is the type: keyword in your mapping for fields description and title. Keyword type fields are not analyzed i.e they store the indexed data exactly like it was sent to elastic. It comes into use when you want to match things like unique IDs etc. Read: https://www.elastic.co/guide/en/elasticsearch/reference/current/keyword.html
You should read about analyzers for elasticsearch. You can create your custom analyzers very easily which can change the data you send them in different ways, like lowercasing everything before they index or search.
Luckily, there are pre-configured analyzers for basic operations such as lowercasing. If you change the type of your description and title fields to type: text, your query would work.
Read: https://www.elastic.co/guide/en/elasticsearch/reference/current/text.html
Also, i see you have dynamic templates configured for your index. So, if you do not specify the mappings for your index explicitly, all your string fields (like description and title) will be treated as type: keyword.
If you build your index like this:
PUT index_name
{
"mappings": {
index_type: {
"properties": {
"description": {"type": "text"},
"title": {"type": "text"}, ...
}
}
}
}
your problem should be solved. This is because type: text fields are analyzed by the standard analyzer by default which lowercases the input, among other things. Read: https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.html