Related
I have a document contain title with "Hard work & Success". I need to do a search for this document. And if I typed "Hardwork" (without spacing) it didn't returning any value. but if I typed "hard work" then it is returning the document.
this is the query I have used :
const search = qObject.search;
const payload = {
from: skip,
size: limit,
_source: [
"id",
"title",
"thumbnailUrl",
"youtubeUrl",
"speaker",
"standards",
"topics",
"schoolDetails",
"uploadTime",
"schoolName",
"description",
"studentDetails",
"studentId"
],
query: {
bool: {
must: {
multi_match: {
fields: [
"title^2",
"standards.standard^2",
"speaker^2",
"schoolDetails.schoolName^2",
"hashtags^2",
"topics.topic^2",
"studentDetails.studentName^2",
],
query: search,
fuzziness: "AUTO",
},
},
},
},
};
if I searched for title "hard work" (included space)
then it returns data like this:
"searchResults": [
{
"_id": "92",
"_score": 19.04531,
"_source": {
"standards": {
"standard": "3",
"categoryType": "STANDARD",
"categoryId": "S3"
},
"schoolDetails": {
"categoryType": "SCHOOL",
"schoolId": "TPS123",
"schoolType": "PUBLIC",
"logo": "91748922mn8bo9krcx71.png",
"schoolName": "Carmel CMI Public School"
},
"studentDetails": {
"studentId": 270,
"studentDp": "164646972124244.jpg",
"studentName": "Nelvin",
"about": "good student"
},
"topics": {
"categoryType": "TOPIC",
"topic": "Motivation",
"categoryId": "MY"
},
"youtubeUrl": "https://www.youtube.com/watch?v=wermQ",
"speaker": "Anna Maria Siby",
"description": "How hardwork leads to success - motivational talk by Anna",
"id": 92,
"uploadTime": "2022-03-17T10:59:59.400Z",
"title": "Hard work & Success",
}
},
]
And if i search for the Keyword "Hardwork" (without spacing) it won't detecting this data. I need to make a space in it or I need to match related datas with the searching keyword. Is there any solution for this can you please help me out of this.
I made an example using a shingle analyzer.
Mapping:
{
"settings": {
"analysis": {
"filter": {
"shingle_filter": {
"type": "shingle",
"max_shingle_size": 4,
"min_shingle_size": 2,
"output_unigrams": "true",
"token_separator": ""
}
},
"analyzer": {
"shingle_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"shingle_filter"
]
}
}
}
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "shingle_analyzer"
}
}
}
}
Now I tested it with your term. Note that the token "hardwork" was generated but the others were also generated which may be a problem for you.
GET idx-separator-words/_analyze
{
"analyzer": "shingle_analyzer",
"text": ["Hard work & Success"]
}
Results:
{
"tokens" : [
{
"token" : "hard",
"start_offset" : 0,
"end_offset" : 4,
"type" : "<ALPHANUM>",
"position" : 0
},
{
"token" : "hardwork",
"start_offset" : 0,
"end_offset" : 9,
"type" : "shingle",
"position" : 0,
"positionLength" : 2
},
{
"token" : "hardworksuccess",
"start_offset" : 0,
"end_offset" : 19,
"type" : "shingle",
"position" : 0,
"positionLength" : 3
},
{
"token" : "work",
"start_offset" : 5,
"end_offset" : 9,
"type" : "<ALPHANUM>",
"position" : 1
},
{
"token" : "worksuccess",
"start_offset" : 5,
"end_offset" : 19,
"type" : "shingle",
"position" : 1,
"positionLength" : 2
},
{
"token" : "success",
"start_offset" : 12,
"end_offset" : 19,
"type" : "<ALPHANUM>",
"position" : 2
}
]
}
I have a name field value as "abc_name" so when I search "abc_" I am getting proper results but when I search "abc_##£&-#&" still I am getting same results. I want my query to ignore this special characters that doesn't matches with my query.
My query has:
Multi_match
type as cross_fields
operator AND
I am using search_analyzer standard for my Fields
And I want this structure as it is otherwise it will affect my other Search behaviour
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
},
"analyzer": "autocomplete",
"search_analyzer": "standard"
}
Please see the below sample which would fit your use case where I've created a custom analyzer which would fit your use case:
Sample Mapping:
PUT some_test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "custom_tokenizer",
"filter": ["lowercase", "3_5_edge_ngram"]
}
},
"tokenizer": {
"custom_tokenizer": {
"type": "pattern",
"pattern": "\\w+_+[^a-zA-Z\\d\\s_]+|\\s+". <---- Note this pattern
}
},
"filter": {
"3_5_edge_ngram": {
"type": "edge_ngram",
"min_gram": 3,
"max_gram": 5
}
}
}
},
"mappings": {
"properties": {
"my_field":{
"type": "text",
"analyzer": "my_custom_analyzer"
}
}
}
}
The above mentioned pattern would simply ignore the tokens with the format like abc_$%^^##. As a result this token would not be indexed.
Note that the way the analyzer works is:
First executes tokenizer
Then applies the edge_ngram filter on the tokens generated.
You can verify by simply removing the edge_ngram filter in the above mapping to first understand what tokens are getting generated via Analyze API which would be as below:
POST some_test_index/_analyze
{
"analyzer": "my_custom_analyzer",
"text": "abc_name asda efg_!##!## 1213_adav"
}
Tokens generated:
{
"tokens" : [
{
"token" : "abc_name",
"start_offset" : 0,
"end_offset" : 8,
"type" : "word",
"position" : 0
},
{
"token" : "asda",
"start_offset" : 9,
"end_offset" : 13,
"type" : "word",
"position" : 1
},
{
"token" : "1213_adav",
"start_offset" : 25,
"end_offset" : 34,
"type" : "word",
"position" : 2
}
]
}
Note that the token efg_!##!## has been removed.
I've added edge_ngram fitler as you would want the search to be successful if you search with abc_ if your tokens generated via tokenizer is abc_name.
Sample Document:
POST some_test_index/_doc/1
{
"my_field": "abc_name asda efg_!##!## 1213_adav"
}
Query Request:
Use-case 1:
POST some_test_index/_search
{
"query": {
"match": {
"my_field": "abc_"
}
}
}
Use-case-2:
POST some_test_index/_search
{
"query": {
"match": {
"my_field": "efg_!##!##"
}
}
}
Responses:
Response for use-case-1:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 0.47992462,
"hits" : [
{
"_index" : "some_test_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 0.47992462,
"_source" : {
"my_field" : "abc_name asda efg_!##!## 1213_adav"
}
}
]
}
}
Response for use-case-2:
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 0,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
}
}
Updated Answer:
Create your mapping as follows based on the index I've created and let me know if that works:
PUT some_test_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "punctuation",
"filter": ["lowercase"]
}
},
"tokenizer": {
"punctuation": {
"type": "pattern",
"pattern": "\\w+_+[^a-zA-Z\\d\\s_]+|\\s+"
}
}
}
},
"mappings": {
"properties": {
"my_field":{
"type": "text",
"analyzer": "autocompete", <----- Assuming you have already this in setting
"search_analyzer": "my_custom_analyzer". <----- Note this
}
}
}
}
Please try and let me know if this works for all your use-cases.
I´m trying to make a query to my bbdd in order to get some info that involve two collections.
First, I have one collecction, called Collectables that is a collection that store all available items that One user can get using an App.
For example, this collection can have... 100 items. This is the maximum number of items.
This is one document of this collection (called collectables)
{
"_id" : ObjectId("5d387ecfbb676b173aa57fe3"),
"img" : "some url",
"name" : "La 5",
"__v" : 0,
"amount" : 17,
"available" : 16,
"collec" : ObjectId("5d36c0c34c86991db93bd7c8"),
"gen" : 3,
"metadata" : {},
"position" : 1
}
Then, I have another collection called AppUsers. In this collection I store all info related to the user. Each user of the App has his own record here. The point is that besides meta info of the user such alias, avatar, age... I have one field called collectables. Is an array. Here I store what collectable have each user.
For example, if one user have 10 collectables (from the other collection) I have 10 entries in this array with that info. Is possible that one user "win" the same item (collectable) twice or more.. so in this collection I store a count field with the total. For example, a user win the collectable with Id 1 the first time, so I add the entry in the array with count 1. If then the user win the same collectable, the count is 2... and so on.
This is an example of one user... with 3 collectables, but several number of each item.
{
"_id" : ObjectId("5d36dc9445526a215c4eff52"),
"twitter" : "1",
"alias" : "ViktorCrowley",
"__v" : 25,
"collectables" : [
{
"count" : 12,
"collectable" : ObjectId("5d36c1ba4c86991db93bd7e7")
},
{
"count" : 25,
"collectable" : ObjectId("5d36c13d4c86991db93bd7c9")
},
{
"count" : 8,
"collectable" : ObjectId("5d381e122f25221126a98f9c")
}
]
}
So, in this case, this user, for example have 3 differents items (collectables). But imagine that the total of collectables from the first collection is 100.
Now... what I´m looking for. I need a query that give me (paginated) the items from the first collection (collectables) and in the case that the user already have one of this items, marked with the total count. I mean, I want all the items from the first collection, with a new field, called count. If the user doens´t have any entry in his array, count will be 0, and if the user for that item, has for example 4 collectables, the count will be 4.
Some thing like this:
[
{
"_id" : ObjectId("5d387ecfbb676b173aa57fe3"),
"img" : "some url",
"name" : "THe one",
"__v" : 0,
"amount" : 17,
"available" : 16,
"collec" : ObjectId("5d36c0c34c86991db93bd7c8"),
"gen" : 3,
"metadata" : {},
"position" : 1,
"count" : 0
},
{
"_id" : ObjectId("5d387ecfbb676b173aa57fe3"),
"img" : "some url",
"name" : "The two",
"__v" : 0,
"amount" : 17,
"available" : 16,
"collec" : ObjectId("5d36c0c34c86991db93bd7c8"),
"gen" : 3,
"metadata" : {},
"position" : 2,
"count" : 1
},
{
"_id" : ObjectId("5d387ecfbb676b173aa57fe4"),
"img" : "some url",
"name" : "The Three",
"__v" : 0,
"amount" : 17,
"available" : 16,
"collec" : ObjectId("5d36c0c34c86991db93bd7c8"),
"gen" : 3,
"metadata" : {},
"position" : 3,
"count" : 0
},
{
"_id" : ObjectId("5d387ecfbb676b173aa57fe4"),
"img" : "Some url",
"name" : "La 5",
"__v" : 0,
"amount" : 17,
"available" : 16,
"collec" : ObjectId("5d36c0c34c86991db93bd7c8"),
"gen" : 3,
"metadata" : {},
"position" : 4,
"count" : 12
}
I tried several things using aggregate and lookup but I can´t get make it work.
The only I could get, was retrieve the info from AppUser and the total count..
Something like this (with mongoose):
AppUser.aggregate([
{ $match: matchQuery },
{$unwind: "$collectables"},
{
$lookup:
{
from: "collectables",
localField: "collectables.collectable",
foreignField: "_id",
as: "result"
}
},
{ $sort: { "result.position": 1 } },
{$unwind: "$result"},
{ $addFields : { "result.count" : "$collectables.count" }
},
{ $replaceRoot: { newRoot: "$result" } },
{ $skip: size * (page - 1) },
{ $limit: size }
]).exec((err, result) =>
{
if (err)
{
console.log(err);
return res.status(401).send({ success: false });
}
else {
return res.status(200).send({ success: true, result });
}
});
So I need help because i have three days with this and I can´t get something nice...
Thanks in advance...
Below query will bring us the expected output
db.appuser.aggregate([
{ $unwind: "$collectables" },
{
$lookup: {
from: "collectables",
localField: "collectables.collectable",
foreignField: "_id",
as: "result"
}
},
{
$project: {
result: 1,
flag: { "$gt": [ {"$size": "$result"}, 0 ] },
"collectables.count": 1,
alias: 1,
_id: 0
}
},
{
$match: { flag: true }
}
]);
Unwind the appuser.collectables array, then do a lookup between the collections collectables and appuser, then use $project and $match to get the desired result.
In the $project stage added a flag to filter out the desired documents, by checking the size of the result array of the $lookup output.
Sample Data used
db.collectables.find()
{
"_id": "5d387ecfbb676b173aa57fe3",
"img": "some url",
"name": "La 5",
"__v": 0,
"amount": 17,
"available": 16,
"collec": "5d36c0c34c86991db93bd7c8",
"gen": 3,
"metadata": {
},
"position": 1
}
db.appuser.find()
[{
"_id": "5d36dc9445526a215c4eff52",
"twitter": "1",
"alias": "ViktorCrowley",
"__v": 25,
"collectables": [
{
"count": 12,
"collectable": "5d36c1ba4c86991db93bd7e7"
},
{
"count": 25,
"collectable": "5d36c13d4c86991db93bd7c9"
},
{
"count": 8,
"collectable": "5d381e122f25221126a98f9c"
}
]
},
{
"_id": "5d36dc9445526a215c4efga1",
"twitter": "1",
"alias": "John",
"__v": 11,
"collectables": [
{
"count": 10,
"collectable": "5d387ecfbb676b173aa57fe3"
}
]
},
{
"_id": "6dc2dc9445526a215c4efga1",
"twitter": "1",
"alias": "Alice",
"__v": 11,
"collectables": [
{
"count": 25,
"collectable": "5d387ecfbb676b173aa57fe3"
},
{
"count": 3,
"collectable": "5d36c13d4c86991db9312349"
},
{
"count": 5,
"collectable": "5d381e122f25221126a9711c"
}
]
}]
Final Output
{
"alias": "John",
"collectables": {
"count": 10
},
"result": [
{
"_id": "5d387ecfbb676b173aa57fe3",
"img": "some url",
"name": "La 5",
"__v": 0,
"amount": 17,
"available": 16,
"collec": "5d36c0c34c86991db93bd7c8",
"gen": 3,
"metadata": {
},
"position": 1
}
],
"flag": true
}
{
"alias": "Alice",
"collectables": {
"count": 25
},
"result": [
{
"_id": "5d387ecfbb676b173aa57fe3",
"img": "some url",
"name": "La 5",
"__v": 0,
"amount": 17,
"available": 16,
"collec": "5d36c0c34c86991db93bd7c8",
"gen": 3,
"metadata": {
},
"position": 1
}
],
"flag": true
}
Hope it helps!
I can't find any documentation on what happens if Elastic Bulk API fails on one or more of the actions. For example, for the following request, let's say there is already a document with id "3", so "create" should fail- does this fail all of the other actions?
{ "index" : { "_index" : "test", "_type" : "type1", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_type" : "type1", "_id" : "2" } }
{ "create" : { "_index" : "test", "_type" : "type1", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_id" : "1", "_type" : "type1", "_index" : "index1"} }
{ "doc" : {"field2" : "value2"} }
I'm using nodejs elastic module.
No failures in one action does not affect the others .
From the documentation of elasticsearch bulk api :
The response to a bulk action is a large JSON structure with the
individual results of each action that was performed. The failure of a
single action does not affect the remaining actions.
In the response from elasticsearch client there is status in response corresponding to each action to determine if it was a failure or not
Example:
client.bulk({
body: [
// action description
{ index: { _index: 'test', _type: 'test', _id: 1 } },
// the document to index
{ title: 'foo' },
// action description
{ update: { _index: 'test', _type: 'test', _id: 332 } },
// the document to update
{ doc: { title: 'foo' } },
// action description
{ delete: { _index: 'test', _type: 'test', _id: 33 } },
// no document needed for this delete
]
}, function (err, resp) {
if(resp.errors) {
console.log(JSON.stringify(resp, null, '\t'));
}
});
Response:
{
"took": 13,
"errors": true,
"items": [
{
"index": {
"_index": "test",
"_type": "test",
"_id": "1",
"_version": 20,
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 200
}
},
{
"update": {
"_index": "test",
"_type": "test",
"_id": "332",
"status": 404,
"error": {
"type": "document_missing_exception",
"reason": "[test][332]: document missing",
"shard": "-1",
"index": "test"
}
}
},
{
"delete": {
"_index": "test",
"_type": "test",
"_id": "33",
"_version": 2,
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"status": 404,
"found": false
}
}
]
}
i'm pretty new to elasticsearch and i want to use synonyms, i added these lines in the configuration file:
index :
analysis :
analyzer :
synonym :
type : custom
tokenizer : whitespace
filter : [synonym]
filter :
synonym :
type : synonym
synonyms_path: synonyms.txt
then i created an index test:
"mappings" : {
"test" : {
"properties" : {
"text_1" : {
"type" : "string",
"analyzer" : "synonym"
},
"text_2" : {
"search_analyzer" : "standard",
"index_analyzer" : "synonym",
"type" : "string"
},
"text_3" : {
"type" : "string",
"analyzer" : "synonym"
}
}
}
}
and insrted a type test with this data:
{
"text_3" : "foo dog cat",
"text_2" : "foo dog cat",
"text_1" : "foo dog cat"
}
synonyms.txt contains "foo,bar,baz", and when i search for foo it returns what i expected but when i search for baz or bar it return zero results:
{
"query":{
"query_string":{
"query" : "bar",
"fields" : [ "text_1"],
"use_dis_max" : true,
"boost" : 1.0
}}}
result:
{
"took":1,
"timed_out":false,
"_shards":{
"total":5,
"successful":5,
"failed":0
},
"hits":{
"total":0,
"max_score":null,
"hits":[
]
}
}
I don't know, if your problem is because you defined bad the synonyms for "bar". As you said you are pretty new I'm going to put an example similar to yours that works. I want to show how elasticsearch deal with synonyms at search time and at index time. Hope it helps.
First thing create the synonym file:
foo => foo bar, baz
Now I create the index with the particular settings you are trying to test:
curl -XPUT 'http://localhost:9200/test/' -d '{
"settings": {
"index": {
"analysis": {
"analyzer": {
"synonym": {
"tokenizer": "whitespace",
"filter": ["synonym"]
}
},
"filter" : {
"synonym" : {
"type" : "synonym",
"synonyms_path" : "synonyms.txt"
}
}
}
}
},
"mappings": {
"test" : {
"properties" : {
"text_1" : {
"type" : "string",
"analyzer" : "synonym"
},
"text_2" : {
"search_analyzer" : "standard",
"index_analyzer" : "standard",
"type" : "string"
},
"text_3" : {
"type" : "string",
"search_analyzer" : "synonym",
"index_analyzer" : "standard"
}
}
}
}
}'
Note that synonyms.txt must be in the same directory that the configuration file since that path is relative to the config dir.
Now index a doc:
curl -XPUT 'http://localhost:9200/test/test/1' -d '{
"text_3": "baz dog cat",
"text_2": "foo dog cat",
"text_1": "foo dog cat"
}'
Now the searches
Searching in field text_1
curl -XGET 'http://localhost:9200/test/_search?q=text_1:baz'
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.15342641,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 0.15342641,
"_source": {
"text_3": "baz dog cat",
"text_2": "foo dog cat",
"text_1": "foo dog cat"
}
}
]
}
}
You get the document because baz is synonym of foo and at index time foo is expanded with its synonyms
Searching in field text_2
curl -XGET 'http://localhost:9200/test/_search?q=text_2:baz'
result:
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
I don't get hits because I didn't expand synonyms while indexing (standard analyzer). And, since I'm searching baz and baz is not in the text, I don't get any result.
Searching in field text_3
curl -XGET 'http://localhost:9200/test/_search?q=text_3:foo'
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 0.15342641,
"hits": [
{
"_index": "test",
"_type": "test",
"_id": "1",
"_score": 0.15342641,
"_source": {
"text_3": "baz dog cat",
"text_2": "foo dog cat",
"text_1": "foo dog cat"
}
}
]
}
}
Note: text_3 is "baz dog cat"
text_3 was indexes without expanding synonyms. As I'm searching for foo, which have "baz" as one of the synonyms I get the result.
If you want to debug you can use _analyze endpoint for example:
curl -XGET 'http://localhost:9200/test/_analyze?text=foo&analyzer=synonym&pretty=true'
result:
{
"tokens": [
{
"token": "foo",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 1
},
{
"token": "baz",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 1
},
{
"token": "bar",
"start_offset": 0,
"end_offset": 3,
"type": "SYNONYM",
"position": 2
}
]
}