Query Doesn't Match Numbers In Text - node.js

Match queries can find strings that contain numbers, in this case, I am trying to search matching phone numbers. Mappings and analyzers are provided below. For example, I have an index as follows
{
"userId": 126817,
"name": "Test User",
"phoneNumber": "5551112233",
}
When I use match query doesn't match anything
{"match" : {"phoneNumber": {"query": "555"}}}
When I use prefix value it does match
{"prefix" : {"phoneNumber": {"value ": "555"}}}
Analyze Results
{
"tokens": [
{
"token": "5551112233",
"start_offset": 0,
"end_offset": 10,
"type": "<NUM>",
"position": 0
}
]
}
Mapping
{
index: "user-clinics",
type: "user-clinic",
body: {properties: {id: {type: "long"}} }
}
Analyzers
const TurkishAnalyzer = {
analysis: {
filter: {
my_ascii_folding: {
type: "asciifolding",
preserve_original: true
}
},
analyzer: {
turkish_analyzer: {
tokenizer: "standard",
filter: ["lowercase", "my_ascii_folding"]
}
}
}
};
const AutoCompleteAnalyzer = {
analysis: {
filter: {
autocomplete_filter: {
type: "edge_ngram",
min_gram: 1,
max_gram: 20
}
},
analyzer: {
autocomplete_search: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase"]
},
autocomplete_index: {
type: "custom",
tokenizer: "standard",
filter: ["lowercase", "autocomplete_filter"]
}
}
}
};

It's because edge_ngram tokenizes only from the beginning of the token, hence all prefixes will be indexed, i.e. a, as, asd, asd1, asd12, asd123
You need to change your autocomplete_filter to ngram if you also want to be able to match inside tokens, i.e. d12 or 123.
Beware, though, that this might generate a lot more tokens

Related

elasticsearch doesn't find results when searching the exact term

I am using the elasticsearch module in my nodejs app to query my index using fuzzy completion. The text I'm trying to search is Rome–Fiumicino Leonardo da Vinci International Airport. when searching this term I get no results, but if I cut the term to 50 characters it does find it and return results.
const result = await elasticsearch.search({
index: 'myIndex',
body: {
suggest: {
fuzzinessZero: {
text,
completion: {
field: 'name_suggest',
fuzzy: {
fuzziness: 0,
},
contexts,
},
},
fuzzinessOne: {
text,
completion: {
field: 'name_suggest',
fuzzy: {
fuzziness: 1,
},
contexts,
},
},
fuzzinessTwo: {
text,
completion: {
field: 'name_suggest',
fuzzy: {
fuzziness: 2,
},
contexts,
},
},
},
}
})
This is the result I get in fuzzinessOne
As you can see, the result in the text field is cut to 50 characters (maybe that's the issue). And inside the _source I get back all the inputs which is used for the search, and one of them is the full exact term which I tried to search, as well with all the other available combinations available.
It is worth mentioning that I'm using AWS openSearch.
And this is the settings which I use to create the index:
settings: {
analysis: {
filter: {
autocomplete_filter: {
type: 'edge_ngram',
min_gram: 2,
max_gram: 20,
},
shingle_filter: {
type: 'shingle',
max_shingle_size: 3,
},
},
analyzer: {
autocomplete: {
type: 'custom',
tokenizer: 'standard',
filter: ['lowercase', 'shingle_filter', 'asciifolding'],
},
},
},
}
You are facing this issue because of default value of max_input_length parameter is set to 50.
Below is description given for this parameter in documentation:
Limits the length of a single input, defaults to 50 UTF-16 code
points. This limit is only used at index time to reduce the total
number of characters per input string in order to prevent massive
inputs from bloating the underlying datastructure. Most use cases
won’t be influenced by the default value since prefix completions
seldom grow beyond prefixes longer than a handful of characters.
You can use this default behaviour or you can updated your index mapping with increase value of max_input_length parameter and reindex your data.
{
"mappings": {
"dynamic": "false",
"properties": {
"namesuggest": {
"type": "completion",
"analyzer": "keyword_lowercase_analyzer",
"preserve_separators": true,
"preserve_position_increments": true,
"max_input_length": 100,
"contexts": [
{
"name": "searchable",
"type": "CATEGORY"
}
]
}
}
},
"settings": {
"index": {
"mapping": {
"ignore_malformed": "true"
},
"refresh_interval": "5s",
"analysis": {
"analyzer": {
"keyword_lowercase_analyzer": {
"filter": [
"lowercase"
],
"type": "custom",
"tokenizer": "keyword"
}
}
},
"number_of_replicas": "0",
"number_of_shards": "1"
}
}
}

MongoDB query to find in nested schema

This query is returning the first object but it should not return. Because it has the BU but in different domain. Its doing fine in single objects in collaborators. When there is multiple Its not behaving as expected. How can we do this any suggestions?
My criteria is In the collaborator array
Only BU name or
Only Domain or
Both BU and Domain it should return.
In below situation first one has same domain <{"domain": "xyz.com"}> but still its not returning. Why?
[
{
name: "1",
collaborators: [
{
"domain": "xyz.com"
},
{
"buName": "Vignesh B"
},
{
"domain": "yz.com"
},
{
"domain": "xyz.com",
"buName": "Vignesh B"
}
]
},
{
name: "2",
collaborators: [
{
"domain": "xyz.com",
"buName": "Vignesh BU"
}
]
},
{
name: "3",
collaborators: [
{
"domain": "xyz.com"
}
]
},
{
name: "4",
collaborators: [
{
"buName": "Vignesh BU"
},
{
"domain": "xyz.com"
},
{
"domain": "xyz.com",
"buName": "Vignesh BU"
}
]
}
]
db.collection.find({
$or: [
{
"collaborators.domain": "xyz.com",
"collaborators.buName": {
"$exists": false
}
},
{
"collaborators.buName": "Vignesh BU",
"collaborators.domain": {
"$exists": false
}
},
{
"collaborators.buName": "Vignesh BU",
"collaborators.domain": "xyz.com"
}
]
})
It is not returning the first document because the buName values in this document are "Vignesh B" and not "Vignesh BU". Only add an U in Vignesh B and it works.
Link to mongodb playground
I think there was a comment at wone point that said that the name: "1" document was expected to return (as it matches the second "Only Domain" criteria) but it is not currently. This is because you will need to use the $elemMatch operator since you are querying an array with multiple conditions.
The query should look as follows, as demonstrated in this playground example (note that I've changed the name: 3 document so that it would not match):
db.collection.find({
$or: [
{
"collaborators": {
$elemMatch: {
"domain": "xyz.com",
"buName": {
"$exists": false
}
}
}
},
{
"collaborators": {
$elemMatch: {
"buName": "Vignesh BU",
"domain": {
"$exists": false
}
}
}
},
{
"collaborators": {
$elemMatch: {
"buName": "Vignesh BU",
"domain": "xyz.com"
}
}
}
]
})
Why is this change needed? It is because of the semantics of how querying an array works in MongoDB. When querying on multiple nested conditions without using $elemMatch you are telling the database that different entries in the array can each individually satisfy the requirements. As shown in this playground example, that means that when you run this query:
db.collection.find({
"arr.str": "abc",
"arr.int": 123
})
The following document will match:
{
_id: 1,
arr: [
{
str: "abc"
},
{
int: 123
}
]
}
This is because the first entry in the array satisfies one of the query predicates while the other entry in the array satisfies the second predicate. Changing the query to use $elemMatch changes the semantics to specify that a single entry in the array must successfully satisfy all query predicate conditions which prevents the document above from matching.
In your specific situation the same thing was happening with your first set of conditions of:
{
"collaborators.domain": "xyz.com",
"collaborators.buName": {
"$exists": false
}
}
The first array item in the name: "1" document was matching the collaborators.domain condition. The problem was the second condition. While that same first array entry did not have a buName field, two of the other entries in the array did. Since there is no $elemMatch present, the database checked those other entries, found that the buName existed there, and that caused the query predicates to fail to match and for the document to not get returned. Adding the $elemMatch forces both of those checks to happen against the single entry in the array hence resolving the issue.

How to use filter expressions on aws using python3 for nested map attribute?

I have been trying to scan DynamoDB to check for particular value in a nested map attribute named deliverables. However using scan with filter expressions is resulting in an empty result.
import boto3
result = []
dynamo_client = boto3.client("dynamodb")
paginator = dynamo_client.get_paginator("scan")
operation_parameters = {
'FilterExpression': "#Deliverable= :deliverable",
'ExpressionAttributeNames': {
'#Deliverable': 'deliverables.fc986523-a666-478e-8303-2a1c3c1dc4ba'
},
'ExpressionAttributeValues': {
':deliverable': {
"M": {
"read": {
"BOOL": True
},
"upload": {
"BOOL": True
},
"write": {
"BOOL": True
}
}
}
}
}
for page in paginator.paginate(TableName="TableName", **operation_parameters):
result.append(page["Items"])
print(result)
The items in the dynamo db look like this:
[
[
{
"deliverables":{
"M":{
"7397d832-fefb-4ba2-97a1-0f6e73d611d9":{
"M":{
"read":{
"BOOL":true
},
"upload":{
"BOOL":true
},
"write":{
"BOOL":true
}
}
},
"fc986523-a666-478e-8303-2a1c3c1dc4ba":{
"M":{
"read":{
"BOOL":true
},
"upload":{
"BOOL":true
},
"write":{
"BOOL":true
}
}
}
}
},
"username":{
"S":"username1"
},
"deniedReferences":{
"L":[
]
}
},
{
"deliverables":{
"M":{
"7397d832-fefb-4ba2-97a1-0f6e73d611d9":{
"M":{
"read":{
"BOOL":true
},
"upload":{
"BOOL":false
},
"write":{
"BOOL":false
}
}
},
"fc986523-a666-478e-8303-2a1c3c1dc4ba":{
"M":{
"read":{
"BOOL":true
},
"upload":{
"BOOL":false
},
"write":{
"BOOL":false
}
}
}
}
},
"username":{
"S":"repositoryadmin"
},
"deniedReferences":{
"L":[
]
}
}
]
]
Please let me know if you can help me solve this issue.
The problem is the [dot] here: 'ExpressionAttributeNames': { '#Deliverable': 'deliverables.fc986523-a666-478e-8303-2a1c3c1dc4ba'}
Expressions docs: DynamoDB interprets a dot in an expression attribute name as a character within an attribute's name.
operation_parameters = {
"FilterExpression": "#D0.#D1=:deliverable", # the dot goes here!
"ExpressionAttributeNames": {
"#D0": "deliverables",
"#D1": "fc986523-a666-478e-8303-2a1c3c1dc4ba"
},

search multiple field as regexp query in elasticsearch

I am trying to search by different fields such as title and description. When i type keywords, elasticseach must found something if description or title includes that i typed keywords. This is my goal. How can i reach my goal?
You can see the sample code that i used for one field.
query: {
regexp: {
title: `.*${q}.*`,
},
},
I also tried below one but it gave syntax error.
query: {
regexp: {
title: `.*${q}.*`,
},
regexp: {
description: `.*${q}.*`,
},
},
To do so, you need to use a bool query.
GET /<you index>/_search
{
"query": {
"bool": {
"should": [
{
"regexp": {
"title": ".*${q}.*"
}
},
{
"regexp": {
"description": ".*${q}.*"
}
}
]
}
}
}
You can find the documentation => [doc]

node elastic search strict match

can anyone tell me how to strict match in elasticsearch-js. here is my search
client.search({{
index: 'hash_tag',
type: 'hash_tag',
lenient:false,
body: {
query: {
match: {
tag_name: 'hash tag 1'
}
}
}
}).then(function (body) {
console.log("body", JSON.stringify(body));
}, function (error) {
console.trace(error.message);
})
this query search either hash, tag ,1 i'm looking for exact whole string match.here is my example index style.
{
"_index": "hash_tag",
"_type": "hash_tag",
"_id": "3483",
"_score": 0.019691018,
"_source": {
"id": "3483",
"labels": [
"hash_tag"
],
"tag_name": "hash tag 2"
}
}
by default elasticsearch will "tokenize" your text fields to add them to the inverted index, that's why you get results for each term used.
In order to get the full match you can have different approaches, the simplest would be to use a match_frase:
GET /megacorp/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}
Another option would be to add that specific field with a mapping of not_analyzed, then the text wouldn't be tokenized.

Resources