How to search multiple fields with common terms query in ElasticSearch?

How to search multiple fields with common terms query in ElasticSearch? - search

I'm doing a search on multiple fields, I want to avoid common terms and I want to combine results from all fields. Currently, I got this (cutoff_frequency, boost, bool must filters, etc. omitted to make the problem more visible):
{
query: {
bool: {
should: {
[
{
common: {
title: {
query: q
}
}
},
{
common: {
description: {
query: q
}
}
}
]
}
}
}
}
Let's say I have following records:
id: 1
title: 'Fox'
description: 'Small fox'
id: 2
title: 'Fox'
description: 'Is quick'
id: 3
title: 'Legendary'
description: 'Quick brown fox'
My query is The quick brown fox. In the solution above, id 1 and id 2 have the highest score, because they match 2 out of 2 shoulds. The record with id 3 has no match in the title, so only 1 out of 2 shoulds are matched.
The correct order should be
id: 3 (matches 3 words)
id: 2 (matches 2 words)
id: 1 (matches 1 word, even tough twice)
I don't want to map both fields into a one, because I want to differentiate them in case of a tie, so for example for query fox, correct result would be:
id: 1 (matches twice)
id: 2 (matches once but in the title)
id: 3 (matches once but in the description)
Any suggestions on this would be highly appreciated. Thanks.

Related

How to unwind data held in edges with a "common neighbors" style query?

I have a simple model with a single A Document collection
[{ _key: 'doc1', id: 'a/doc1', name: 'Doc 1' }, { _key: 'doc2', id: 'a/doc2', name: 'Doc 2' }]
and a single B Edge collection, joining documents A with an held weight integer on each edge.
[{ _key: 'xxx', id: 'b/xxx', _from: 'a/doc1', _to: 'a/doc2', weight: 256 }]
I'm trying to make a "common neighbors" style query, that takes 2 document as an input, and yields common neighbors of those inputs, along with respective weights (of each side).
For example with doc1 and doc26 input, here is the goal to achieve :
[
{ _key: 'doc6', weightWithDoc1: 43, weightWithDoc26: 57 },
{ _key: 'doc12', weightWithDoc1: 98, weightWithDoc26: 173 },
{ _key: 'doc21', weightWithDoc1: 3, weightWithDoc26: 98 },
]
I successfully started by targeting a single side :
FOR associated, association
IN 1..1
ANY ${d1}
${EdgeCollection}
SORT association.weight DESC
LIMIT 20
RETURN { _key: associated._key, weight: association.weight }
Then successfully went on with the INTERSECTION logic of the documentation
FOR proj IN INTERSECTION(
(FOR associated, association
IN 1..1
ANY ${d1}
${EdgeCollection}
RETURN { _key: associated._key }),
(FOR associated, association
IN 1..1
ANY ${d2}
${EdgeCollection}
RETURN { _key: associated._key })
)
LIMIT 20
RETURN proj
But I'm now struggling at extracting the weight of each side, as unwinding it on the inner RETURN clauses will make them exclusive on the intersection; thus returning nothing.
Questions :
Is there any way to make some kind of "selective INTERSECTION", grouping some fields in the process ?
Is there an alternative to INTERSECTION to achieve my goal ?
Bonus question :
Ideally, after successfully extracting weightWithDoc1 and weightWithDoc26, I'd like to SORT DESC by weightWithDoc1 + weightWithDoc26

I managed to find an acceptable answer myself
FOR associated IN INTERSECTION(
(FOR associated
IN 1..1
ANY ${doc1}
${EdgeCollection}
RETURN { _key: associated._key }),
(FOR associated
IN 1..1
ANY ${doc2}
${EdgeCollection}
RETURN { _key: associated._key })
)
LET association1 = FIRST(FOR association IN ${EdgeCollection}
FILTER association._from == CONCAT(${DocCollection.name},'/',MIN([${doc1._key},associated._key])) AND association._to == CONCAT(${DocCollection.name},'/',MAX([${doc1._key},associated._key]))
RETURN association)
LET association2 = FIRST(FOR association IN ${EdgeCollection}
FILTER association._from == CONCAT(${DocCollection.name},'/',MIN([${doc2._key},associated._key])) AND association._to == CONCAT(${DocCollection.name},'/',MAX([${doc2._key},associated._key]))
RETURN association)
SORT (association1.weight+association2.weight) DESC
LIMIT 20
RETURN { _key: associated._key, weight1: association1.weight, weight2: association2.weight }
I believe re-selecting after intersecting is not ideal and not the most performant solution, so I'm leaving it open for now to wait for a better answer.

Does EdgeNGram autocomplete_filter make sense with prefix search?

i have Elastic Search Index with around 1 million records.
I want to do multi prefix search against 2 fields in the Elastic Search Index, Name and ID (there are around 10 total).
Does creating EdgeNGram autocomplete filter make sense at all?
Or i am missing the point of the EdgeNGram.
Here is the code i have for creation of the index:
client.indices.create({
index: 'testing',
// type: 'text',
body: {
settings: {
analysis: {
filter: {
autocomplete_filter: {
type: 'edge_ngram',
min_gram: 3,
max_gram: 20
}
},
analyzer: {
autocomplete: {
type: 'custom',
tokenizer: 'standard',
filter: [
'lowercase',
'autocomplete_filter'
]
}
}
}
}
}
},function(err,resp,status) {
if(err) {
console.log(err);
}
else {
console.log("create",resp);
}
});
Code for searching
client.search({
index: 'testing',
type: 'article',
body: {
query: {
multi_match : {
query: "87041",
fields: [ "name", "id" ],
type: "phrase_prefix"
}
}
}
},function (error, response,status) {
if (error){
console.log("search error: "+error)
}
else {
console.log("--- Response ---");
console.log(response);
console.log("--- Hits ---");
response.hits.hits.forEach(function(hit){
console.log(hit);
})
}
});
The search returns the correct results, so my question being does creating the edgengram filter and analyzer make sense in this case?
Or this prefix functionality would be given out of the box?
Thanks a lot for your info

It is depending on your use case. Let me explain.
You can use ngram for this feature. Let's say your data is london bridge, then if your min gram is 1 and max gram is 20, it will be tokenized as l, lo, lon, etc..
Here the advantage is that even if you search for bridge or any tokens which is part of the generated ngrams, it will be matched.
There is one out of box feature completion suggester. It uses FST model to store them. Even the documentation says it is faster to search but costlier to build. But the think is it is prefix suggester. Meaning searching bridge will not bring london bridge by default. But there are ways to make this work. Workaround to achieve is that, to have array of tokens. Here london bridge and bridge are the tokens.
There is one more called context suggester. If you know that you are going to search on name or id, it is best over completion suggester. As completion suggester works over on all the index, context suggester works on a particular index based on the context.
As you say, it is prefix search you can go for completion. And you mentioned that there 10 such fields. And if you know the field to be suggested at fore front, then you can go for context suggester.
one nice answer about edge ngram and completion
completion suggester for middle of the words - I used this solution, it works like charm.
You can refer documentation for other default options available within suggesters.

match all if the user doesn't specify the field value MongoDB

I am building an API where I have several fields that are optional in my get request. So I want MongoDB to match all values for those optional fields if the user does not specify it. I have come up with this solution:
db.collection(expenses_collection).find(username: username, category: {$regex:"/" + category + "/"}, payment_type: {$regex:"/" + payment_type + "/"}})
Where if category and payment_type are not specified by the user I set them to ".*":
const {category=".*", payment_type=".*"} = req.query;
However, mongodb is still not matching any data. Any help is appreciated. Thanks a lot.

The issue is with your regex string. To match any string value, you have to use this pattern (this matches any string): (.*?)
Consider input documents:
{ _id: 1, name: "John", category: "cat 1", payment_type: "cash" },
{ _id: 2, name: "Jane", category: "cat 2", payment_type: "credit card" }
Usage to match any category field value:
let categoryStr = /(.*?)/
db.exp.find( { category: categoryStr } )
The query returns all documents.
So, in your application for the category value not specified the code can be like this:
if (category is empty or null) { // category not specified by user
categoryStr = /(.*?)/
}
Similarly, for the payment_type field also.
Then query would be:
db.exp.find( {
username: usernameStr,
category: categoryStr,
payment_type: paymentStr
} )
NOTE: The code tests fine with MongoDB NodeJS driver APIs.

Isn't this what exists is made for?
{category: { $exists: true }, payment_type: { $exists: true }}

mongodb - most efficient way of calculating missing indices in sequence

Given a collection with lets say 1.000.000 entries and each of them have their own unique property called number which is indexed. How can I efficiently find the lowest gap in the number sequence.
An easy example would be a sequence of indexes like: 1,2,3,4,6,7,10, where I would like to get back the number 5 since this will be the lowest missing number in the sequence.
Is there a possible way (maybe aggregation) without the need to query all numbers.

One way of doing this would be with a cursor. With a cursor, you can manually iterate through the documents until you find one that matches your criteria.
var cursor = db.coll.find({}).sort({number: 1});
var prev = null
while (cusor.hasNext()) {
var curr = cursor.getNext()
if (prev && prev.number + 1 !== curr.number) break;
prev = curr;
}

One is get all the numbers and find the ones missing between them.
An aggregate example that you can use to not have to get them all. https://www.mongodb.com/community/forums/t/query-to-find-missing-sequence/123771/2
// Assuming the sample data with sequence numbers from 1 thru 10 as follows:
{ id: 1 },
{ id: 2 },
{ id: 4 },
{ id: 7 },
{ id: 9 },
{ id: 10 }
// And, note the missing numbers are 3, 5, 6 and 8. You can use the following aggregation to find them:
db.collection.aggregate([
{
$group: {
_id: null,
nos: { $push: "$id" }
}
},
{
$addFields: {
missing: { $setDifference: [ { $range: [ 1, 11 ] }, "$nos" ] }
}
}
])

Sort using a field in an embedded doc in an array without considering other equivalent fields in mongodb

I have collection called Products.
Documents of Products look like this:
{
id: 123456,
recommendationByCategory: [
{ categoryId: a01,
recommendation: 3
},
{
categoryId: 0a2,
recommendation: 8
},
{
categoryId: 0b10
recommendation: 99
},
{
categoryId : 0b5
recommendation: 1
}
]
}
{
id: 567890,
recommendationByCategory: [
{ categoryId: a7,
recommendation: 3
},
{
categoryId: 0a2,
recommendation: 1
},
{
categoryId: 0b10
recommendation: 999
},
{
categoryId : 0b51
recommendation: 12
}
]
}
I want to find all the docs that contain categoryId: 0a2 in recommendationByCategory, but want to get sorted using the recommendation of the category 0a2 alone in asc order. It must not consider recommendations of other categoryId. I need id: 567890 followed by id: 123456.
I cannot use aggregation. Is it possible using Mongodb/Mongoose? I tried giving sort option of 'recommendationByCategory.recommendation: 1' but it's not working.
Expected Query: db.collection('products').find({'recommendaionByCategory.categoryId': categoryId}).sort({'recommendationByCategory.recommendation: 1'})
Expected Result:
[
{doc with id:567890},
{doc with id: 123456}
]

If you cannot use mapReduce or the aggregation pipeline, there is no easy way to both search for the matching embedded document and sort on that document's prop.
I would recommend doing the find as you do above (note the typo in the find nested key), and then sorting in-memory:
const categoryId = '0a2';
const findRec = doc => doc.recommendationByCategory.find({ categoryId }).recommendation;
db.collection('products')
.find({'recommendationByCategory.categoryId': categoryId})
.then(docs => docs.sort((a, b) => findRec(a) < findRec(b));
In regard to the Aggregation Pipeline being resource-intensive: it is several orders of magnitude more efficient than a Map-Reduce query, and solves your particular issue. Either you accept that this task will be run at a certain cost and frequency, taking into account Mongo's built-in caching, or you restructure your document schema to allow you to make this query more efficiently.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to search multiple fields with common terms query in ElasticSearch? - search

Related

How to unwind data held in edges with a "common neighbors" style query?

Does EdgeNGram autocomplete_filter make sense with prefix search?

match all if the user doesn't specify the field value MongoDB

mongodb - most efficient way of calculating missing indices in sequence

Sort using a field in an embedded doc in an array without considering other equivalent fields in mongodb

Categories

Resources