Is there any solution for searching exact word and containing word both in elasticsearch - node.js

index: process.env.elasticSearchIndexName,
body: {
query: {
bool: {
must: [
{
match_phrase: {
title: `${searchKey}`,
},
},
],
},
},
},
from: (page || constants.pager.page),
size: (limit || constants.pager.limit),
i am using above method but problem in that is it only search exact matched words in whole text.
it can't search containing word.. for example if title = "sweatshirt" than if i type word "shirt" it should come the result but currently not got the result using above method

Standard analyzer(default analyzer if none is specified) breaks texts in tokens.
For sentence "this is a test" tokens generated are [this,is,a,test]
Match_pharse query breaks text in tokens using same analyzer as indexing analyzer and returns documents which 1. contain all the tokens 2. tokens appear in same order.
Since you text is sweatshirt there is single token in inverted index for it "sweatshirt" which will not match with either sweat or shirt
NGram tokenizer
The ngram tokenizer first breaks text down into words whenever it encounters one of a list of specified characters, then it emits N-grams of each word of the specified length
Mapping
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "my_tokenizer"
}
},
"tokenizer": {
"my_tokenizer": {
"type": "ngram",
"min_gram": 3,
"max_gram": 3,
"token_chars": [
"letter",
"digit"
]
}
}
}
},
"mappings": {
"properties": {
"text":{
"type": "text",
"analyzer": "my_analyzer"
}
}
}
}
Query:
{
"query": {
"match": {
"text": "shirt"
}
}
}
If you will run _analyze query
GET my_index/_analyze
{
"text": ["sweatshirt"],
"analyzer": "my_analyzer"
}
you will see below token are generated for the text sweatshirt. Size of tokens can be adjusted using min_gram and max_gram
{
"tokens" : [
{
"token" : "swe",
"start_offset" : 0,
"end_offset" : 3,
"type" : "word",
"position" : 0
},
{
"token" : "wea",
"start_offset" : 1,
"end_offset" : 4,
"type" : "word",
"position" : 1
},
{
"token" : "eat",
"start_offset" : 2,
"end_offset" : 5,
"type" : "word",
"position" : 2
},
{
"token" : "ats",
"start_offset" : 3,
"end_offset" : 6,
"type" : "word",
"position" : 3
},
{
"token" : "tsh",
"start_offset" : 4,
"end_offset" : 7,
"type" : "word",
"position" : 4
},
{
"token" : "shi",
"start_offset" : 5,
"end_offset" : 8,
"type" : "word",
"position" : 5
},
{
"token" : "hir",
"start_offset" : 6,
"end_offset" : 9,
"type" : "word",
"position" : 6
},
{
"token" : "irt",
"start_offset" : 7,
"end_offset" : 10,
"type" : "word",
"position" : 7
}
]
}
Warning:Ngrams increase the size of the inverted index so use with appropriate value of min_gram and max_gram
Another option is to use wildcard query. For wildcard all the documents have to scanned to check if text matches the pattern. They have low performance.
When using wildcard search on not_analyzed fields in case you want to include whitespace ex text.keyword
{
"query": {
"wildcard": {
"text": {
"value": "*shirt*"
}
}
}
}

Related

how to match a related data if incorrectly texted a keyword in elastic search

I have a document contain title with "Hard work & Success". I need to do a search for this document. And if I typed "Hardwork" (without spacing) it didn't returning any value. but if I typed "hard work" then it is returning the document.
this is the query I have used :
const search = qObject.search;
const payload = {
from: skip,
size: limit,
_source: [
"id",
"title",
"thumbnailUrl",
"youtubeUrl",
"speaker",
"standards",
"topics",
"schoolDetails",
"uploadTime",
"schoolName",
"description",
"studentDetails",
"studentId"
],
query: {
bool: {
must: {
multi_match: {
fields: [
"title^2",
"standards.standard^2",
"speaker^2",
"schoolDetails.schoolName^2",
"hashtags^2",
"topics.topic^2",
"studentDetails.studentName^2",
],
query: search,
fuzziness: "AUTO",
},
},
},
},
};
if I searched for title "hard work" (included space)
then it returns data like this:
"searchResults": [
{
"_id": "92",
"_score": 19.04531,
"_source": {
"standards": {
"standard": "3",
"categoryType": "STANDARD",
"categoryId": "S3"
},
"schoolDetails": {
"categoryType": "SCHOOL",
"schoolId": "TPS123",
"schoolType": "PUBLIC",
"logo": "91748922mn8bo9krcx71.png",
"schoolName": "Carmel CMI Public School"
},
"studentDetails": {
"studentId": 270,
"studentDp": "164646972124244.jpg",
"studentName": "Nelvin",
"about": "good student"
},
"topics": {
"categoryType": "TOPIC",
"topic": "Motivation",
"categoryId": "MY"
},
"youtubeUrl": "https://www.youtube.com/watch?v=wermQ",
"speaker": "Anna Maria Siby",
"description": "How hardwork leads to success - motivational talk by Anna",
"id": 92,
"uploadTime": "2022-03-17T10:59:59.400Z",
"title": "Hard work & Success",
}
},
]
And if i search for the Keyword "Hardwork" (without spacing) it won't detecting this data. I need to make a space in it or I need to match related datas with the searching keyword. Is there any solution for this can you please help me out of this.
I made an example using a shingle analyzer.
Mapping:
{
"settings": {
"analysis": {
"filter": {
"shingle_filter": {
"type": "shingle",
"max_shingle_size": 4,
"min_shingle_size": 2,
"output_unigrams": "true",
"token_separator": ""
}
},
"analyzer": {
"shingle_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"shingle_filter"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "shingle_analyzer"
}
}
}
}
Now I tested it with your term. Note that the token "hardwork" was generated but the others were also generated which may be a problem for you.
GET idx-separator-words/_analyze
{
"analyzer": "shingle_analyzer",
"text": ["Hard work & Success"]
}
Results:
{
"tokens" : [
{
"token" : "hard",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "hardwork",
"start_offset" : 0,
"end_offset" : 9,
"type" : "shingle",
"position" : 0,
"positionLength" : 2
},
{
"token" : "hardworksuccess",
"start_offset" : 0,
"end_offset" : 19,
"type" : "shingle",
"position" : 0,
"positionLength" : 3
},
{
"token" : "work",
"start_offset" : 5,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "worksuccess",
"start_offset" : 5,
"end_offset" : 19,
"type" : "shingle",
"position" : 1,
"positionLength" : 2
},
{
"token" : "success",
"start_offset" : 12,
"end_offset" : 19,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}

ElasticSearch can't get multiple suggestor values from the same document

Can you help me please?
I have a problem with Completion Suggester in ElasticSearch
Example: I have this mapping :
PUT music
{
"mappings": {
"properties": {
"suggest": {
"type": "completion"
},
"title": {
"type": "keyword"
}
}
}
}
and index multiple suggestions for a document as follows:
PUT music/_doc/1?refresh
{
"suggest": [
{
"input": "Nirva test",
"weight": 10
},
{
"input": "Nirva hola",
"weight": 3
}
]
}
Querying: you can do this request on kibana
POST music/_search?pretty
{
"suggest": {
"song-suggest": {
"prefix": "nirv",
"completion": {
"field": "suggest"
}
}
}
}
and the result I retrieve only the first value but not both.
I did the test on kibana dev tool too and this is the result
{
"took" : 1,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"suggest" : {
"song-suggest" : [
{
"text" : "nir",
"offset" : 0,
"length" : 3,
"options" : [
{
"text" : "Nirvana test",
"_index" : "music",
"_type" : "_doc",
"_id" : "1",
"_score" : 10.0,
"_source" : {
"suggest" : [
{
"input" : "Nirvana test",
"weight" : 10
},
{
"input" : "Nirvana best",
"weight" : 3
}
]
}
}
]
}
]
}
}
expected result :
"suggest" : {
"song-suggest" : [
{
"text" : "nirvana",
"offset" : 0,
"length" : 7,
"options" : [
{
"text" : "Nirvana test",
"_index" : "music",
"_type" : "_doc",
"_id" : "1",
"_score" : 10.0,
"_source" : {
"suggest" : [
{
"input" : "Nirvana test",
"weight" : 10
},
{
"input" : "Nirvana best",
"weight" : 3
}
]
}
}
]
},
{
"text" : "nirvana b",
"offset" : 0,
"length" : 9,
"options" : [
{
"text" : "Nirvana best",
"_index" : "music",
"_type" : "_doc",
"_id" : "1",
"_score" : 3.0,
"_source" : {
"suggest" : [
{
"input" : "Nirvana test",
"weight" : 10
},
{
"input" : "Nirvana best",
"weight" : 3
}
]
}
}
]
}
]
}
This is the default behavior of current implementations. You can check #31738. Below is one of the comment for an explanation why it is returning only one document/suggestion.
The completion suggester is document-based by design so we cannot
return one entry per matching suggestion. It is documented that it
returns documents not suggestions and a single input can be indexed in
multiple suggestions (if you have synonyms in your analyzer for
instance) so it is not trivial to differentiate a match from its
variations. Also the completion suggester does not visit all
suggestions to select the top N, it has a special structure (a
weighted FST) that can visit suggestions in the order of their scores
and early terminates the query once enough documents have been found.

Elastic Search multi match query can't ignore special characters

I have a name field value as "abc_name" so when I search "abc_" I am getting proper results but when I search "abc_##£&-#&" still I am getting same results. I want my query to ignore this special characters that doesn't matches with my query.
My query has:
Multi_match
type as cross_fields
operator AND
I am using search_analyzer standard for my Fields
And I want this structure as it is otherwise it will affect my other Search behaviour
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
Please see the below sample which would fit your use case where I've created a custom analyzer which would fit your use case:
Sample Mapping:
PUT some_test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "custom_tokenizer",
"filter": ["lowercase", "3_5_edge_ngram"]
}
},
"tokenizer": {
"custom_tokenizer": {
"type": "pattern",
"pattern": "\\w+_+[^a-zA-Z\\d\\s_]+|\\s+". <---- Note this pattern
}
},
"filter": {
"3_5_edge_ngram": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 5
}
}
}
},
"mappings": {
"properties": {
"my_field":{
"type": "text",
"analyzer": "my_custom_analyzer"
}
}
}
}
The above mentioned pattern would simply ignore the tokens with the format like abc_$%^^##. As a result this token would not be indexed.
Note that the way the analyzer works is:
First executes tokenizer
Then applies the edge_ngram filter on the tokens generated.
You can verify by simply removing the edge_ngram filter in the above mapping to first understand what tokens are getting generated via Analyze API which would be as below:
POST some_test_index/_analyze
{
"analyzer": "my_custom_analyzer",
"text": "abc_name asda efg_!##!## 1213_adav"
}
Tokens generated:
{
"tokens" : [
{
"token" : "abc_name",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 0
},
{
"token" : "asda",
"start_offset" : 9,
"end_offset" : 13,
"type" : "word",
"position" : 1
},
{
"token" : "1213_adav",
"start_offset" : 25,
"end_offset" : 34,
"type" : "word",
"position" : 2
}
]
}
Note that the token efg_!##!## has been removed.
I've added edge_ngram fitler as you would want the search to be successful if you search with abc_ if your tokens generated via tokenizer is abc_name.
Sample Document:
POST some_test_index/_doc/1
{
"my_field": "abc_name asda efg_!##!## 1213_adav"
}
Query Request:
Use-case 1:
POST some_test_index/_search
{
"query": {
"match": {
"my_field": "abc_"
}
}
}
Use-case-2:
POST some_test_index/_search
{
"query": {
"match": {
"my_field": "efg_!##!##"
}
}
}
Responses:
Response for use-case-1:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.47992462,
"hits" : [
{
"_index" : "some_test_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.47992462,
"_source" : {
"my_field" : "abc_name asda efg_!##!## 1213_adav"
}
}
]
}
}
Response for use-case-2:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
Updated Answer:
Create your mapping as follows based on the index I've created and let me know if that works:
PUT some_test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "punctuation",
"filter": ["lowercase"]
}
},
"tokenizer": {
"punctuation": {
"type": "pattern",
"pattern": "\\w+_+[^a-zA-Z\\d\\s_]+|\\s+"
}
}
}
},
"mappings": {
"properties": {
"my_field":{
"type": "text",
"analyzer": "autocompete", <----- Assuming you have already this in setting
"search_analyzer": "my_custom_analyzer". <----- Note this
}
}
}
}
Please try and let me know if this works for all your use-cases.

Mongodb query join two collections

I´m trying to make a query to my bbdd in order to get some info that involve two collections.
First, I have one collecction, called Collectables that is a collection that store all available items that One user can get using an App.
For example, this collection can have... 100 items. This is the maximum number of items.
This is one document of this collection (called collectables)
{
"_id" : ObjectId("5d387ecfbb676b173aa57fe3"),
"img" : "some url",
"name" : "La 5",
"__v" : 0,
"amount" : 17,
"available" : 16,
"collec" : ObjectId("5d36c0c34c86991db93bd7c8"),
"gen" : 3,
"metadata" : {},
"position" : 1
}
Then, I have another collection called AppUsers. In this collection I store all info related to the user. Each user of the App has his own record here. The point is that besides meta info of the user such alias, avatar, age... I have one field called collectables. Is an array. Here I store what collectable have each user.
For example, if one user have 10 collectables (from the other collection) I have 10 entries in this array with that info. Is possible that one user "win" the same item (collectable) twice or more.. so in this collection I store a count field with the total. For example, a user win the collectable with Id 1 the first time, so I add the entry in the array with count 1. If then the user win the same collectable, the count is 2... and so on.
This is an example of one user... with 3 collectables, but several number of each item.
{
"_id" : ObjectId("5d36dc9445526a215c4eff52"),
"twitter" : "1",
"alias" : "ViktorCrowley",
"__v" : 25,
"collectables" : [
{
"count" : 12,
"collectable" : ObjectId("5d36c1ba4c86991db93bd7e7")
},
{
"count" : 25,
"collectable" : ObjectId("5d36c13d4c86991db93bd7c9")
},
{
"count" : 8,
"collectable" : ObjectId("5d381e122f25221126a98f9c")
}
]
}
So, in this case, this user, for example have 3 differents items (collectables). But imagine that the total of collectables from the first collection is 100.
Now... what I´m looking for. I need a query that give me (paginated) the items from the first collection (collectables) and in the case that the user already have one of this items, marked with the total count. I mean, I want all the items from the first collection, with a new field, called count. If the user doens´t have any entry in his array, count will be 0, and if the user for that item, has for example 4 collectables, the count will be 4.
Some thing like this:
[
{
"_id" : ObjectId("5d387ecfbb676b173aa57fe3"),
"img" : "some url",
"name" : "THe one",
"__v" : 0,
"amount" : 17,
"available" : 16,
"collec" : ObjectId("5d36c0c34c86991db93bd7c8"),
"gen" : 3,
"metadata" : {},
"position" : 1,
"count" : 0
},
{
"_id" : ObjectId("5d387ecfbb676b173aa57fe3"),
"img" : "some url",
"name" : "The two",
"__v" : 0,
"amount" : 17,
"available" : 16,
"collec" : ObjectId("5d36c0c34c86991db93bd7c8"),
"gen" : 3,
"metadata" : {},
"position" : 2,
"count" : 1
},
{
"_id" : ObjectId("5d387ecfbb676b173aa57fe4"),
"img" : "some url",
"name" : "The Three",
"__v" : 0,
"amount" : 17,
"available" : 16,
"collec" : ObjectId("5d36c0c34c86991db93bd7c8"),
"gen" : 3,
"metadata" : {},
"position" : 3,
"count" : 0
},
{
"_id" : ObjectId("5d387ecfbb676b173aa57fe4"),
"img" : "Some url",
"name" : "La 5",
"__v" : 0,
"amount" : 17,
"available" : 16,
"collec" : ObjectId("5d36c0c34c86991db93bd7c8"),
"gen" : 3,
"metadata" : {},
"position" : 4,
"count" : 12
}
I tried several things using aggregate and lookup but I can´t get make it work.
The only I could get, was retrieve the info from AppUser and the total count..
Something like this (with mongoose):
AppUser.aggregate([
{ $match: matchQuery },
{$unwind: "$collectables"},
{
$lookup:
{
from: "collectables",
localField: "collectables.collectable",
foreignField: "_id",
as: "result"
}
},
{ $sort: { "result.position": 1 } },
{$unwind: "$result"},
{ $addFields : { "result.count" : "$collectables.count" }
},
{ $replaceRoot: { newRoot: "$result" } },
{ $skip: size * (page - 1) },
{ $limit: size }
]).exec((err, result) =>
{
if (err)
{
console.log(err);
return res.status(401).send({ success: false });
}
else {
return res.status(200).send({ success: true, result });
}
});
So I need help because i have three days with this and I can´t get something nice...
Thanks in advance...
Below query will bring us the expected output
db.appuser.aggregate([
{ $unwind: "$collectables" },
{
$lookup: {
from: "collectables",
localField: "collectables.collectable",
foreignField: "_id",
as: "result"
}
},
{
$project: {
result: 1,
flag: { "$gt": [ {"$size": "$result"}, 0 ] },
"collectables.count": 1,
alias: 1,
_id: 0
}
},
{
$match: { flag: true }
}
]);
Unwind the appuser.collectables array, then do a lookup between the collections collectables and appuser, then use $project and $match to get the desired result.
In the $project stage added a flag to filter out the desired documents, by checking the size of the result array of the $lookup output.
Sample Data used
db.collectables.find()
{
"_id": "5d387ecfbb676b173aa57fe3",
"img": "some url",
"name": "La 5",
"__v": 0,
"amount": 17,
"available": 16,
"collec": "5d36c0c34c86991db93bd7c8",
"gen": 3,
"metadata": {
},
"position": 1
}
db.appuser.find()
[{
"_id": "5d36dc9445526a215c4eff52",
"twitter": "1",
"alias": "ViktorCrowley",
"__v": 25,
"collectables": [
{
"count": 12,
"collectable": "5d36c1ba4c86991db93bd7e7"
},
{
"count": 25,
"collectable": "5d36c13d4c86991db93bd7c9"
},
{
"count": 8,
"collectable": "5d381e122f25221126a98f9c"
}
]
},
{
"_id": "5d36dc9445526a215c4efga1",
"twitter": "1",
"alias": "John",
"__v": 11,
"collectables": [
{
"count": 10,
"collectable": "5d387ecfbb676b173aa57fe3"
}
]
},
{
"_id": "6dc2dc9445526a215c4efga1",
"twitter": "1",
"alias": "Alice",
"__v": 11,
"collectables": [
{
"count": 25,
"collectable": "5d387ecfbb676b173aa57fe3"
},
{
"count": 3,
"collectable": "5d36c13d4c86991db9312349"
},
{
"count": 5,
"collectable": "5d381e122f25221126a9711c"
}
]
}]
Final Output
{
"alias": "John",
"collectables": {
"count": 10
},
"result": [
{
"_id": "5d387ecfbb676b173aa57fe3",
"img": "some url",
"name": "La 5",
"__v": 0,
"amount": 17,
"available": 16,
"collec": "5d36c0c34c86991db93bd7c8",
"gen": 3,
"metadata": {
},
"position": 1
}
],
"flag": true
}
{
"alias": "Alice",
"collectables": {
"count": 25
},
"result": [
{
"_id": "5d387ecfbb676b173aa57fe3",
"img": "some url",
"name": "La 5",
"__v": 0,
"amount": 17,
"available": 16,
"collec": "5d36c0c34c86991db93bd7c8",
"gen": 3,
"metadata": {
},
"position": 1
}
],
"flag": true
}
Hope it helps!

Multiple $group in mongoDB

I have in mongodb differents records. I write down a little example:
{_id:"sad547er4w2v5x85b8", name:"Jhon", jobTime:600, floor:2, dept:5, age:25},
{_id:"xcz547wer4xcvcx1g2", name:"Alex", jobTime:841, floor:4, dept:1, age:55},
{_id:"xcnwep2321954ldfsl", name:"Alice", jobTime:100, floor:3, dept:3, age:55},
{_id:"23s3ih94h548jhfk2u", name:"Anne", jobTime:280, floor:2, dept:8, age:22},
{_id:"03dfsk9342hjwq1503", name:"Alexa", jobTime:355, floor:2, dept:6, age:25}
I tried to obtain this output, but I don't know how to group by twice to get that structure.
{[
{age:22, floors:[{floor:2,persons:[{name:"Anne",jobTime:280,dept:8}]}]},
{age:25, floors:[{floor:2,persons:[{name:"Jhon",jobTime:600,dept:5},{name:"Alexa",jobTime:355,dept:6}]}]},
{age:55, floors:[{floor:3,persons:[{name:"Alex",jobTime:841,dept:1}]},{floor:4,persons:[{name:"Alice",jobTime:100,dept:3}]}]}
]}
Exactly. Use "two" $group stages
collection.aggregate([
{ "$group": {
"_id": {
"age": "$age",
"floor": "$floor",
},
"persons": { "$push": {
"name": "$name",
"jobTime": "$jobTime",
"dept": "$dept"
}}
}},
{ "$group": {
"_id": "$_id.age",
"floors": { "$push": {
"floor": "$_id.floor",
"persons": "$persons"
}}
}}
],function(err,results) {
// deal with results here
})
Which produces:
{
"_id" : 25,
"floors" : [
{ "floor" : 2,
"persons" : [
{ "name" : "Jhon", "jobTime" : 600, "dept" : 5 },
{ "name" : "Alexa", "jobTime" : 355, "dept" : 6 }
]
}
]
},
{
"_id" : 55,
"floors" : [
{ "floor" : 3,
"persons" : [
{ "name" : "Alice", "jobTime" : 100, "dept" : 3 }
]
},
{ "floor" : 4,
"persons" : [
{ "name" : "Alex", "jobTime" : 841, "dept" : 1 }
]
}
]
},
{
"_id" : 22,
"floors" : [
{ "floor" : 2,
"persons" : [
{ "name" : "Anne", "jobTime" : 280, "dept" : 8 }
]
}
]
}
So the initial $group is on a compound key including the detail down to the items you want to add to the initial "array", for "persons". Then the second $group takes only part of the initial _id for it's key and again "pushes" the content into a new array.

Resources