How to create an index for partial text search on MongoDB?

How to create an index for partial text search on MongoDB? - node.js

I'm following the tutorial instruction: https://docs.mongodb.com/manual/core/index-text/
This is the sample data:
db.stores.insert(
[
{ _id: 1, name: "Java Hut", description: "Coffee and cakes" },
{ _id: 2, name: "Burger Buns", description: "Gourmet hamburgers" },
{ _id: 3, name: "Coffee Shop", description: "Just coffee" },
{ _id: 4, name: "Clothes Clothes Clothes", description: "Discount clothing" },
{ _id: 5, name: "Java Shopping", description: "Indonesian goods" }
]
)
Case 1: db.stores.find( { $text: { $search: "java coffee shop" } } ) => FOUND
Case 2: db.stores.find( { $text: { $search: "java" } } ) => FOUND
Case 3: db.stores.find( { $text: { $search: "coff" } } ) => NOT FOUND
I'm expecting case 3 is FOUND because the query is matches a part of java coffee shop

Case 3 will not work with $text operator and reason is how Mongo Creates Text Indexes.
Mongo takes text indexed fields values and creates separate indexes for each unique word in string and not character(!).
so this means, that in your case for 1 object:
field name will have 2 indexes:
java
hut
field description will have 3 indexes:
coffee
and
cakes
$text operator compare $search values with this indexes and that's why "coff" will not work.
If you strongly want to take advantages of indexes you have to use $text operator, but it does not give you all flexibility, just like you want.
solution:
You Can simply use $regex with case sensitiveness option (i) and optimize your query with skip and limit.
If you want to return all documents and collection is large, $regex will cause performance issue
you can also check this article https://medium.com/coding-in-depth/full-text-search-part-1-how-to-create-mongodb-full-and-partial-text-search-c09c0bae17a3 and maybe use wildcard indexes for that, but i do not know is it a good practice or not.

Related

Deleting items from the array of objects

I have a document that looks like:
{
_id: "....",
hostname: "mysite.com",
text: [
{
source: "this is text. this is some pattern",
...
},
{
source: "....",
...
}
]
}
and I am trying to delete the items from the text array which match a specific condition in my query given as:
db.getCollection('TM').updateMany(
{hostname: "mysite.com"},
{
$pull: {
"text.source": /this is some pattern/
}
},
{ multi: true }
)
Here I want to delete all the items from the array where the value inside source matches this is some pattern. When I execute this query, it gives an error saying: Cannot use the part (source) of (text.source) to traverse the element with error code 28.
What is the way to achieve this?

it gives an error saying: Cannot use the part (source) of (text.source) to traverse the element with error code 28.
Incorrect syntax of $pull for update methods,
Corrected syntax, and You can use $regex to find document by specific pattern,
"text.source" the condition in filter part will filter main documents, it is optional
text: { source: will filter sub documents and pull elements
db.getCollection('TM').updateMany(
{
hostname: "mysite.com",
"text.source": { $regex: "this is some pattern" }
},
{
$pull: {
text: { source: { $regex: "this is some pattern" } }
}
}
)
Playground

MongoDB aggregation $group stage by already created values / variable from outside

Imaging I have an array of objects, available before the aggregate query:
const groupBy = [
{
realm: 1,
latest_timestamp: 1318874398, //Date.now() values, usually different to each other
item_id: 1234, //always the same
},
{
realm: 2,
latest_timestamp: 1312467986, //actually it's $max timestamp field from the collection
item_id: 1234,
},
{
realm: ..., //there are many of them
latest_timestamp: ...,
item_id: 1234,
},
{
realm: 10,
latest_timestamp: 1318874398, //but sometimes then can be the same
item_id: 1234,
},
]
And collection (example set available on MongoPlayground) with the following schema:
{
realm: Number,
timestamp: Number,
item_id: Number,
field: Number, //any other useless fields in this case
}
My problem is, how to $group the values from the collection via the aggregation framework by using the already available set of data (from groupBy) ?
What have been tried already.
Okay, let skip crap ideas, like:
for (const element of groupBy) {
//array of `find` queries
}
My current working aggregation query is something like that:
//first stage
{
$match: {
"item": 1234
"realm" [1,2,3,4...,10]
}
},
{
$group: {
_id: {
realm: '$realm',
},
latest_timestamp: {
$max: '$timestamp',
},
data: {
$push: '$$ROOT',
},
},
},
{
$unwind: '$data',
},
{
$addFields: {
'data.latest_timestamp': {
$cond: {
if: {
$eq: ['$data.timestamp', '$latest_timestamp'],
},
then: '$latest_timestamp',
else: '$$REMOVE',
},
},
},
},
{
$replaceRoot: {
newRoot: '$data',
},
},
//At last, after this stages I can do useful job
but I found it a bit obsolete, and I already heard that using [.mapReduce][1] could solve my problem a bit faster, than this query. (But official docs doesn't sound promising about it) Does it true?
As for now, I am using 4 or 5 stages, before start working with useful (for me) documents.
Recent update:
I have checked the $facet stage and I found it curious for this certain case. Probably it will help me out.
For what it's worth:
After receiving documents after the necessary stages I am building a representative cluster chart, that you may also know as a heatmap
After that I was iterating each document (or array of objects) one-by-one to find their correct x and y coordinated in place which should be:
[
{
x: x (number, actual $price),
y: y (number, actual $realm),
value: price * quantity,
quantity: sum_of_quantity_on_price_level
}
]
As for now, it's old awful code with for...loop inside each other, but in the future, I will be using $facet => $bucket operators for that kind of job.

So, I have found an answer to my question in another, but relevant way.
I was thinking about using $facet operator and to be honest, it's still an option, but using it, as below is a bad practice.
//building $facet query before aggregation
const ObjectQuery = {}
for (const realm of realms) {
Object.assign(ObjectQuery, { `${realm.name}` : [ ... ] }
}
//mongoose query here
aggregation([{
$facet: ObjectQuery
},
...
])
So, I have chosen a $project stage and $switch operator to filter results, such as $groups do.
Also, using MapReduce could also solve this problem, but for some reason, the official Mongo docs recommends to avoid using it, and choose aggregation: $group and $merge operators instead.

How does MongoDB $text search works?

I have inserted following values in my events collection
db.events.insert(
[
{ _id: 1, name: "Amusement Ride", description: "Fun" },
{ _id: 2, name: "Walk in Mangroves", description: "Adventure" },
{ _id: 3, name: "Walking in Cypress", description: "Adventure" },
{ _id: 4, name: "Trek at Tikona", description: "Adventure" },
{ _id: 5, name: "Trekking at Tikona", description: "Adventure" }
]
)
I've also created a index in a following way:
db.events.createIndex( { name: "text" } )
Now when I execute the following query (Search - Walk):
db.events.find({
'$text': {
'$search': 'Walk'
},
})
I get these results:
{ _id: 2, name: "Walk in Mangroves", description: "Adventure" },
{ _id: 3, name: "Walking in Cypress", description: "Adventure" }
But when I search Trek:
db.events.find({
'$text': {
'$search': 'Trek'
},
})
I get only one result:
{ _id: 4, name: "Trek at Tikona", description: "Adventure" }
So my question is why it dint resulted:
{ _id: 4, name: "Trek at Tikona", description: "Adventure" },
{ _id: 5, name: "Trekking at Tikona", description: "Adventure" }
When I searched walk it resulted the documents containing both walk and walking. But when I searched for Trek it only resulted the document including trek where it should have resulted both trek and trekking

MongoDB text search uses the Snowball stemming library to reduce words to an expected root form (or stem) based on common language rules. Algorithmic stemming provides a quick reduction, but languages have exceptions (such as irregular or contradicting verb conjugation patterns) that can affect accuracy. The Snowball introduction includes a good overview of some of the limitations of algorithmic stemming.
Your example of walking stems to walk and matches as expected.
However, your example of trekking stems to trekk so does not match your search keyword of trek.
You can confirm this by explaining your query and reviewing the parsedTextQuery information which shows the stemmed search terms used:
db.events.find({$text: {$search: 'Trekking'} }).explain().queryPlanner.winningPlan.parsedTextQuery
{
"terms" : [
"trekk"
],
"negatedTerms" : [ ],
"phrases" : [ ],
"negatedPhrases" : [ ]
}
You can also check expected Snowball stemming using the online Snowball Demo or by finding a Snowball library for your preferred programming language.
To work around exceptions that might commonly affect your use case, you could consider adding another field to your text index with keywords to influence the search results. For this example, you would add trek as a keyword so that the event described as trekking also matches in your search results.
There are other approaches for more accurate inflection which are generally referred to as lemmatization. Lemmatization algorithms are more complex and start heading into the domain of natural language processing. There are many open source (and commercial) toolkits that you may be able to leverage if you want to implement more advanced text search in your application, but these are outside the current scope of the MongoDB text search feature.

Using $concat with $project is giving error : 'MongoError: $concat only supports strings, not double'?

I have a mongoose model in which some fields are like :
var AssociateSchema = new Schema({
personalInformation: {
familyName: { type: String },
givenName: { type: String }
}
})
I want to perform a '$regex' on the concatenation of familyName and givenName (something like 'familyName + " " + 'givenName'), for this purpose I'm using aggregate framework with $concat inside $project to produce a 'fullName' field and then '$regex' inside $match to search on that field. The code in mongoose for my query is:
Associate.aggregate([
{ $project: {fullName: { $concat: [
'personalInformation.givenName','personalInformation.familyName']}}},
$match: { fullName: { 'active': true, $regex: param, $options: 'i' } }}
])
But it's giving me error:
MongoError: $concat only supports strings, not double on the first
stage of my aggregate pipeline i.e $project stage.
Can anyone point out what I'm doing wrong ?

I also got this error and then discovered that indeed one of the documents in the collection was to blame. They way I fished it out was by filtering by field type as explained in the docs:
db.addressBook.find( { "zipCode" : { $type : "double" } } )
I found the field had the value NaN, which to my eyes wouldn't be a number, but mongodb interprets it as such.

Looking at your code, I'm not sure why $concat isn't working for you unless you've had some integers sneak into some of your document fields. Have you tried having a $-sign in front of your concatenated values? as in, '$personalInformation.givenName'? Are you sure every single familyName and givenName is a string, not a double, in your collection? All it takes is one double for your $concat to fold.
In any case, I had a similar type mismatch problem with actual doubles. $concat indeed supports only strings, and usually, all you'd do is cast any non-strings to strings.. but alas, at the time of this writing MongoDB 3.6.2 does not yet support integer/double => string casting, only date => string casting. Sad face.
That said, try adding this projection hack at the top of your query. This worked for me as a typecast. Just make sure you provide a long enough byte length (128-byte name is pretty long so you should be okay).
{
$project: {
castedGivenName: {
$substrBytes: [ 'personalInformation.givenName', 0, 128 ]
},
castedFamilyName: {
$substrBytes: [ 'personalInformation.familyName', 0, 128 ]
}
},
{
$project: {
fullName: {
$concat: [
'$castedGivenName',
'$castedFamilyName'
]
}
}
},
{
$match: { fullName: { 'active': true, $regex: param, $options: 'i' } }
}

I managed to make it work by using $substr method, so the $project part of my aggregate pipeline is now:
`$project: {
fullName: {
$concat: [
{ $substr: ['$personalInformation.givenName', 0, -1] }, ' ', { $substr: ['$personalInformation.familyName', 0, -1] }
]
}
}
}`

How can I find similar documents in MongoDB?

I have food db listing similar to:
{
Name: "burger",
ingredients: [
{Item:"bread"},
{Item:"cheese"},
{Item:"tomato"}
]
}
How can I find documents that have the most similar items in ingredients?

First of all, your data should be remodelled as below:
{
name: "Burger",
ingredients: [
"bread",
"cheese",
"tomato",
"beef"
]
}
The extra "Item" does not add any additional information nor does it help accessing the data in any way.
Next, you need to create a text index. The docs state that
text indexes can include any field whose value is a string or an array of string elements.
So we simply do a
db.collection.ensureIndex({"ingredients":"text"})
Now we can do a $text search:
db.collection.find(
{ $text: { $search: "bread beef" } },
{ score: { $meta: "textScore" } }
).sort( { score: { $meta: "textScore" } } )
which should give you the most relevant documents.
However, what you could also do is a non-text search for direct matches:
db.collection.find({ingredients:"beef"})
or for multiple ingredients
db.collections.find({ ingredients: { $all: ["beef","bread"] } })
So for searching by user input, you can use the text search and for search by selected ingredients, you can use the non-text search.

your best chance is if you store ingredients in a text field i.e:
{ingredients : "bread cheese tomato"} then you have to use a text index and query for similarity db.your_collection.find({$text: {$search: {"tomato" }}, {score: { $meta: "textScore" }}).sort({score: {$meta: "textScore" } } ) and get most relevant documents

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string